Authoring Scientific Publications with Rmarkdown

David M. Kaplan
2020-03-19

About

  • Objectives: Provides a short overview of writing scientific papers with Rmarkdown
  • Audience: Individuals already familiar with Rmarkdown looking to write scientific papers entirely in Rmarkdown
  • Author: The author's web site is here
  • The definitive version of this presentation is here

In a hurry

  • Check out the resources slide
  • Check out the various templates with different levels of complexity:

Why Rmarkdown?

  • Combine text with code for producing figures and tables
  • Reproducibility
  • Potentially beautiful formatting
  • Latex equations

Basic idea for publications

  • Rmarkdown produces Latex as intermediate step when knitting to PDF
  • Most journals accept Latex submissions
  • Use special templates and Rmarkdown tricks to get correct formatting for publications

Look at Rmarkdown template

First, check out the basic Rmarkdown template here or the knitr output in PDF format.

Attractive, but missing a number of things to be a true publication.

Missing elements

  • True title page
  • Abstract, keywords
  • Section numbering
  • Figure captions
  • Table formatting and captions
  • Equations
  • Figures and tables at end of document
  • Cross-referencing figures, tables and sections
  • Citations and references
  • Formatting specific to journal
  • Save Latex output
  • Hide code

Missing elements

Parts that can be done in basic Rmarkdown:

  • Section numbering
  • Figure captions
  • Table formatting and captions
  • equations
  • Figures / tables at end
  • Citations and references
  • Save Latex output
  • Hide code

Parts that require more advanced formatting:

  • Cross-referencing figures, tables and sections
  • Title page & abstract
  • Formatting specific to journal
  • Keywords, corresponding author

Resources

A few useful websites:

Essential packages:

Section numbering

  • Sections can be numbered in almost all formats with a small addition to the YAML header:
output: 
  pdf_document: 
    number_sections: yes
  • This generally works for all formats, but journal specific formats may have numbering already turned on or off according to the journal's specifications

Figure captions

  • Figure captions can be added with the fig.cap chunk optional argument

    ```{r myfig, fig.cap="This is the caption"}
    plot(1:10)
    ```
    

Table formatting and captions

  • Unfortunately, there is no chunk option tab.cap for R chunks (there is one for SQL chunks).
  • Instead need to explicitly tell the chunk to print the table with a caption

    • Use kable, xtable, etc.
    • I will just present kable

      ```{r}
      df = data.frame(id=1:5,res=letters[1:5])
      knitr::kable(df,caption="Table caption")
      ```
      
  • booktabs option seems to produce better tables in Latex

Equations

  • Equations are written using Latex equation format
  • Two types: inline and “display”, which appear on a separate line
  • Inline example:
    • $E=mc^2$ produces: \( E=mc^2 \)
    • Single $ at beginning and end of equation codes

Equations

  • Double $ at beginning and end of display equations
  • Display example:
$$
E=mc^2
$$
  • Produces:

\[ E=mc^2 \]

Equations

Equations can get a lot more complex:

$$
\frac{d\tilde{p}}{dx} \bigg|_{x_{\nu,\text{infl}}} = -\frac{\nu}{\alpha} \frac{1}{\nu+1} \left( \frac{\nu}{\nu+1} \right)^{\nu} = -\frac{1}{\alpha} \left( \frac{\nu}{\nu+1} \right)^\nu
$$

Produces:

\[ \frac{d\tilde{p}}{dx} \bigg|_{x_{\nu,\text{infl}}} = -\frac{\nu}{\alpha} \frac{1}{\nu+1} \left( \frac{\nu}{\nu+1} \right)^{\nu} = -\frac{1}{\alpha} \left( \frac{\nu}{\nu+1} \right)^\nu \]

Resources: Site 1, Site 2

Figures and tables at end of document

  • Journals often want figures and tables at the end of the document
  • Often easier to read paper this way
  • Also want figure captions on a page separate from figures
  • This can all be achieved by using the endfloat Latex package (tables and figures are called floats as they float on page)
  • Activate in YAML header:
header-includes:
  - \usepackage{endfloat}

Figures and tables at end of document

  • Package options can alter handling of table and figure floats at end of document:
header-includes:
  - \usepackage[nomarkers,tablesfirst]{endfloat}
  • See Latex documentation of endfloat for more options

Citations and references

  • The easiest way to activate formatted references and citations is using a BibTex file and a CSL style file
  • 3 steps:
    • Adding appropriate lines to YAML header
    • Citing references in text
    • Adding final section heading for bibliography

Step 1: YAML header modifications

  • Add the following 2 lines to YAML header:
bibliography: BIBLIOGRAPHY.bib
csl: STYLE.csl
  • In a real case, one would replace BIBLIOGRAPHY.bib and STYLE.csl with the path to a bibtex file containing your references and a CSL style file determining how to format references
  • CSL style files for most journals are available at the Zotero style repository

How to generate a bibtex file?

  • Reference management programs (e.g., Zotero, Mendeley, Endnotes) will export to Bibtex
  • I recommend Zotero: open source, non-corporate, and feature rich
  • Better Bibtex for Zotero plugin makes Zotero powerful tool for Bibtex.
    • Better citation keys
    • Local .bib web address for Zotero repository and subcollections. Used in Rmarkdown documents to automatically update .bib file using download.file command
    • ctrl+shift+c copy of references for citation

Step 2: Citing references

  • References can be cited in documents as follows:
This is an important result deserving citation [@CITEKEY; @OTHERCITEKEY].

@CITEKEY showed important things.
  • where CITEKEY and OTHERCITEKEY are citation keys found at the top of each entry in the .bib file.
  • The presence and absence of [] around the citation changes the form of the citation.
  • Better Bibtex for Zotero makes inserting these easy

Step 3: Final section for bibliography

The bibliography with cited references will automatically be placed at the end of the output document (but before any floats placed at the end by endfloat). Therefore, one generally ends the R document with a section header for the references:

# References

Saving Latex output

  • Most journals do not accept a .Rmd file, but rather require that you submit a Latex file.
  • knitr generates a Latex file as an intermediate step to building the final PDF output.
  • You can save this .tex file with an extra output format option keep_tex:
output: 
  pdf_document: 
    number_sections: yes
    keep_tex: yes

Hide code

One can hide all code in a document by setting echo=FALSE in the setup chunk at the start of the document:

knitr::opts_chunk$set(echo = FALSE)

Result so far

You can check out a document employing the approaches mentioned so far here, along with its PDF output and the associated .bib file.

Cross-referencing figures, tables and sections

  • Cross-referencing figures, tables, sections and equations can be done without using special Rmarkdown formats
  • But it is simpler, more consistent and more portable using bookdown
  • I will first show without and then with bookdown

Section references

  • Section references change little with and without bookdown:
# My section title
  • Then you would cross-reference it anywhere in the document as follows:
Section \ref{my-section-title}
  • You can also give a section an easier label:
# My section title {#sect1}

Section \ref{sect1}

Table and figure cross-references

  • Achieved by inserting Latex \label commands in the captions:

    ```{r fig.cap="\\label{fig:a_fig}Figure caption"}
    plot(1:10)
    ```
    
    ```{r}
    df = data.frame(id=1:5,res=letters[1:5])
    knitr::kable(df,caption="\\label{tab:a_tab}Table caption")
    ```
    
    Fig. \ref{fig:a_fig}, Table \ref{tab:a_tab}
    
  • Note the double slash (\\) inside the quotes because a single slash is treated by R as the start of an escape sequence.

Equation cross-references

  • $$...$$ does not permit Latex \label
  • Instead wrap equation in \begin{equation}...\label{eq:a_eq}\end{equation}
\begin{equation}
E=mc^2
\label{eq:a_eq}
\end{equation}

Eq. \ref{eq:a_eq}
  • At this point, label prefixes fig:, tab: and eq: are optional

Result so far

You can check out a document employing the approaches mentioned so far here, along with its PDF output.

Bookdown

  • The above works, but only if you are generating a PDF. Will fail for Word or HTML output.
  • The bookdown package is a really powerful tool for writing books with Rmarkdown.
  • But it also facilitates cross-references
  • Figures and table labels associated with chunk label
  • Cross-referencing style also works with Word and HTML output

Bookdown

  • Use bookdown in Rmarkdown by modifying the output format:
output:
  bookdown::pdf_document2:
    df_print: kable
    keep_tex: true
    number_sections: yes
    toc: no
  • bookdown::word_document2 and bookdown::html_document2 formats also exist

Bookdown: Table and figure references

  • Figures and tables automatically get a label based on chunk name

    ```{r fig1,fig.cap="Figure caption"}
    plot(1:10)
    ```
    
    ```{r tab1}
    df = data.frame(id=1:5,res=letters[1:5])
    knitr::kable(df,caption="Table caption")
    ```
    
    Fig. \@ref(fig:fig1), Table \@ref(tab:tab1)
    
  • Bookdown doesn't like special characters (e.g., '_') in chunk labels

  • Prefixes tab: and fig: obligatory

Bookdown: Equation references

  • Wrap equation in \begin{equation}...(\#eq:eq1)\end{equation}
\begin{equation}
E=mc^2
(\#eq:eq1)
\end{equation}

Eq. \@ref(eq:eq1)
  • Again, eq: prefix obligatory and no funky characters

Title page and abstract

  • Bookdown allows you to do more with the author: field of the YAML header
    • Multiple authors
    • Corresponding author
    • Addresses
  • You can also add an abstract: field
author: |
  | John Doe $^1$^[Corresponding author: john.doe@nowhere.org], Jane Smith $^{2,3}$
  |
  | $^1$ Address number 1
  | $^2$ Address number 2
  | $^3$ Address number 3
abstract: |
  | This is a small abstract.
  |
  | It has two paragraphs.

Line spacing and numbers

  • Line spacing and numbering can be controlled by modifying the header-includes field in the YAML header:
header-includes:
  - \usepackage{endfloat} # From previous    
  - \usepackage{setspace}\doublespacing
  - \usepackage{lineno}
  - \linenumbers

Result so far

You can check out a document employing the approaches mentioned so far here, along with its PDF output.

Formatting specific to journal

  • The final step to making something publication ready is to fit the PDF to a specific journal's format
  • Most major journals and/or publishing houses provide Latex style files for writing papers in Latex that follow the journal's format specifications
  • More and more of these have been adapted for use with Rmarkdown in the package rticles
  • Can be used on its own, but best when wrapped in bookdown

rticles standalone use

  • rticles can be used all by itself
  • Install package
  • When creating new Rmarkdown document in Rstudio, choose to create from a template and lots of journal/publishing house formats should appear as potential templates
  • Choose your journal
  • If your journal is not available, it may be possible to create a Rmarkdown template for the journal
    • I did it for Oxford University Press journals
  • Features of templates vary somewhat by journal

rticles wrapped in bookdown

  • The bookdown::pdf_book output format permits you to specify a base_format
  • For base format, you can use, e.g., rticles::elsevier_article, to create an Elsevier article with bookdown advantages for cross-referencing
  • Strategy:
    • Create a new Rmarkdown document based on template for journal of interest
    • Modify the output format to use bookdown
output: 
  bookdown::pdf_book:
    base_format: rticles::oup_article
    keep_tex: yes
    df_print: kable

Title page, abstract, keywords

  • Each journal format is a little different on how to create a full title page
  • The template generally indicates how to enter information in the YAML header
  • I am going to use the oup_article template as I know it well, but others can easily be adapted

Standalone Rmarkdown documents

  • Rmarkdown includes text, equations and code for figures, tables and results
  • But missing data to make it standalone
  • I created the knitrdata package to solve this problem

    ```{r}
    library(knitrdata)
    ```
    
    ```{data output.var="d",load.function=read.csv}
    a,b
    1,2
    3,4
    ```
    
    ```{r}
    d
    ```
    
  • Binary and encrypted data possible (see package vignette)

Collaborating on a Rmarkdown paper

  • The one major benefit of writing a paper in Word is track changes
  • Currently no true equivalent exists in Rmarkdown, though there are many ideas.
  • Alternatives involve either:
    • Working collaboratively on cloud-hosted Rmarkdown document
    • Round trips to and from Word

Collaborating on a Rmarkdown paper: Details

  • Option #1: Hosting .Rmd document on github, google drive or similar and authors work directly in Rmarkdown.
    • The rmdrive package may be helpful.
    • Downside: Requires that all authors are comfortable with Rmarkdown.
  • Option #2: Generate Word version of paper using bookdown::word_document2 that authors can edit.
    • Commenting out bibliography and csl entries in YAML header may facilitate copying modified text back into .Rmd document
    • Downside: Merging must be done by hand.

Final result

You can check out a document employing all approaches mentioned here, along with its PDF output.