Basics of Searching and Reading Scientific Literature

Basics of Searching and Reading Scientific Literature#

Searching the scientific literature is a critical skill in research, and is something you should practice throughout your time in the VIP course. Reading scientific publications from others helps you understand the technical topics that are relevant to your work, but also give you experience in how to effectively communicate scientific findings. If your project is very successful and you are motivated to write then you might even be able to help write a new contribution to the scientific literature through this course!

Basic Search Strategies#

Absolute Searching
Relative Searching
Continuous Updates
AI-powered Searching

Absolute Searches#

Use databases such as Web of Science and/or Scopus.
Search for terms that you think are relevant to your interest
- Use quotation marks and “wildcard” characters (*, ?)
- Constrain to reviews when learning a new topic.
- Use “exclude” functionality to avoid unwanted topics.
- Remember that you can “refine” searches.
Sorting is important!
- Most relevant.
- Most recent.
- Highest cited.

Relative Searches#

Start with a review paper or a few very relevant papers.
Go through references list, particularly introductory references.
- Use Crossref Metadata search to find articles from citations.
- Some journal sites will have links to cited works.
- Look for commonly cited authors.
Many journal websites will have a “cited by” column. This can be very valuable.
Search Web of Science or Scopus by author, including keywords as necessary.

Continuous Updates#

Once you get an overview of a branch of literature you still need to stay current!
Different communities have different feeds.
- For machine learning, materials, and quantum mechanics I recommend this feed curated by Citrine: https://citrine.us3.list-manage.com/subscribe?u=014639e57e11aa6a7d2f2e4a8&id=60e6d7fee0
- You can set up alerts for Google Scholar searches.

AI-Powered Searching#

This is a new (as of 2023) and powerful but dangerous tool. There are many emerging tools that claim to use artificial intelligence (AI) to accelerate literature searches. Some seem promising, but others fail miserably. I strongly recommend getting familiar with the basic “manual” search strategies above before venturing into these, but I realize that the allure is very strong. Here I will highlight two tools: one that works reasonably well, and one that fails in the most dangerous possible way.

Perplexity - This AI-powered search is connected to the internet, which allows it to generally find references. A good strategy is to ask it to Write a well-referenced paragraph about... which gives some context along with citations. Note that many of them will not be peer-reviewed, but some may, and it can help you navigate the jargon of an unfamiliar field. However, you should always carefully review any reference yourself before citing it to ensure that the tool did not make a mistake.
ChatGPT - This AI tool made large language models famous, but it is a terrible failure at searching the literature. It will make up very convincing references that simply do not exist. If you tell it they don’t exist, it will apologize and make up some more. This process can be repeated ad nauseum, but in my experience it has never found a real useful reference. These “false positives” are very dangerous in AI-powered literature searches, so proceed with caution.

Evaluating Journals#

Impact factor is the canonical measure of importance
- Nature, Science, PNAS
- Nature family journals
- JACS (chemistry), Physical Review (physics)

Eigenfactor.org is a useful tool for visualizing journal impact.
Different fields have different metrics. For example, peer-reviewed conferences are highly prestigious in computer science and mathematics, while some of the most impactful papers are posted on non-peer-reviewed pre-print servers like arXiv.

Managing References#

Drinking from the firehose: Entering a new field will lead to information overload.
- Abstracts are your friend. Use them to differentiate useful from non-useful papers with minimal overhead.
- Learn to skim papers before reading them deeply.
- Prioritize the papers you read carefully.
Storing and Organizing
- Mendeley offers good cross-platform literature management and sharing.
- Project-specific Bibtex files.
- Various other software available.
There are some “recommender” services through Mendeley, and others, that can be used to leverage your existing library to keep up with new papers.

A basic algorithm for searching a new subject#

Absolute search: Identify <10 key papers
- Review papers (most highly cited)
- Most relevant papers
Relative search: Identify <5 top authors
- Use citations from key papers
- Find most highly cited relevant papers from these authors
Absolute search: Refine search terms based on current reading
- Most relevant papers
- Newest papers
Repeat steps 2-3 as necessary.

Most topics can be summarized in <100 well-chosen papers.
- 1-3 review papers
- All highly-cited older papers
- Newest high-impact papers (Nature, Science, PNAS, Phys. Rev.)
- Balance of technical vs. perspective
- Balance of theory/experiment
- Balance of different authors/groups

Reading a Paper: Typical Paper Format#

Introduction
- Motivation for why the work is important
- Context of previous work in related areas
Results & Discussion
- Results of experiments/calculations
- Analysis of data
- Discussion of hypotheses verified
- Discussion of relevance to larger problem(s)
Experimental/Methods
- Detailed explanation outlining how the work could be repeated
Conclusion
- Summary of importance of work
- Summary of question answered/hypothesis verified
Appendices/Supplementary Information
- Data and discussion that is only relevant for repeating work
- Discussion of details/decisions that would not affect conclusions

Basic Reading Strategy#

Zooming In#

Iteration 1: Skim the paper
- Read abstract
- Look at figures and captions
- Look for relevant equations/tables
- Read conclusion
Iteration 2: Read the paper
- Read introduction, results, discussion
- Skim experimental/methods
- Take notes! Store relevant keywords in citation manager.
Iteration 3: Deep Dive
- Read experimental/methods
- Read supplementary information
- Implement equations/methods

Arbitrary guideline for “zooming in”#

Skim about 70-90% of papers from a search
Read about 50% of the papers you skim
Deep dive about 10% of the papers you read

Directions for Literature Search for VIP students:#

At least every two weeks you should use the strategies outlined above (or other strategies), to identify a peer-reviewed scientific paper that is relevant to your VIP project. First, skim the paper and answer the “Skimming questions” outlined below. Next, read the paper and answer the “Reading Questions”. Overall, you should spend between 30-60 minutes on each paper, and you should use the questions below as a guideline for discussing the paper. The responses should be included in each bi-weekly update.

Questions for Literature Review Assignment:#

Skimming questions:#

Summarize the overall goal of the work.#

What figure is most interesting or relevant? What is the main thing you learned from it?#

Reading questions:#

What is the strongest point of the paper?#

What is the weakest point of the paper?#

How does the paper relate to your research project?#

An Example:#

Name: AJ Medford
Paper: Wellendorff, J., Lundgaard, K. T., Møgelhøj, A., Petzold, V., Landis, D. D., Nørskov, J. K., … Jacobsen, K. W. (2012). Density functionals for surface science: Exchange-correlation model development with Bayesian error estimation. Physical Review B, 85(23). doi:10.1103/physrevb.85.235149

Skimming Questions:

1) Summarize the overall goal of the work.

A methodology for semiempirical density functional optimization, using regularization and cross-validation
methods from machine learning, is developed, and a general-purpose density functional for
surface science and catalysis studies is introduced. The functional is evaluated for accuracy in describing bond breaking and formation in chemistry, lattice constants from solid
state physics, and adsorptoin energies for surface chemistry.

2) What figure is most interesting or relevant? What is the main thing you learned from it?

Figure 5 was most interesting because it demonstrates that the approach yields an *ensemble* of approximations.

Reading Questions:

3) What is the strongest point of the paper?

The fact that the functional has a built-in estimate of the error that can generalize to systems outside the training space.

4) What is the weakest point of the paper?

The model space they use is not complete, so the error of the functionals is still relatively large.

5) How does the paper relate to your research project?

This paper uses a combination of quantum-mechanical simulations (DFT) and big-data techniques to establish a new way of doing DFT simulations.