On the use of Excel for reproducible tables

7 minute read

Published:

I sometimes see researchers doing much copy-pasting between output from Stata/R/MATLAB and Excel, to create tables. This is a lot of work, the source of many copy-paste errors, and a bad idea. I describe here one way to avoid this, even when the researcher does not want to invest in a new tool.

The typical table in economics papers will have up to 9 columns, and 5-25 rows. Seeking out, then copy-pasting up to 225 numbers per table is a lot of work, inhibits re-running of code (because “too complicated”), and is a source of errors. There are many packages out there that might allow you to construct a custom table in the programming language of your choice1, such as (Stata) estout,2 outreg2,3 regsave, 4, (R) stargazer,5, modelsummary,6 and many others in R, Julia, Python, etc.

Here, however, I want to emphasize a custom approach that leverages what most people already know: how to use cell references in Excel. I combine a simple export of regression output to Excel sheets, combined with an Excel template. The replication package would contain the (possibly pre-populated) Excel template, and voilà! A reproducible table is born, and updated every time the code is re-run. While you might still need to update the table in your paper (or correctly link an Excel sheet to Word so that it also auto-updates…)

A more complete version, including with downloadable sample code and any improvements and error corrections, is available at https://larsvilhuber.github.io/writing-r-to-excel/.

The Example

The included file regression_results.xlsx has a tab called “My funny table”, which references the content of the two other tabs:

My funny table

The content of the two tabs Regression1 and Regression2 will be filled with the output from two regressions. I have explicitly not set a seed, so that the values change very time you run the file. You can verify that the “funny table” is automatically updated with the latest results.

You can find the full code as a Quarto (R) file at https://github.com/larsvilhuber/writing-r-to-excel.

Setup

Load required packages and set up environment. The package openxlsx is used in preference to writexl because we explicitly want to preserve the contens of any existing (formatted!) Excel file.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(openxlsx)
library(broom)

Data Generation

Create sample datasets for the regression analyses.

# Create sample data for first regression
set.seed(123)
data1 <- tibble(
  x1 = rnorm(100),
  x2 = rnorm(100),
  y1 = 2*x1 + 0.5*x2 + rnorm(100)
)

# Create sample data for second regression
data2 <- tibble(
  z1 = rnorm(100),
  z2 = rnorm(100),
  z3 = rnorm(100),
  y2 = 1.5*z1 - 0.8*z2 + 0.3*z3 + rnorm(100)
)

Analysis

Perform regression analyses on both datasets.

# Run regressions
reg1 <- lm(y1 ~ x1 + x2, data = data1)
reg2 <- lm(y2 ~ z1 + z2 + z3, data = data2)

# Convert regression results to tidy format
reg1_results <- tidy(reg1)
reg2_results <- tidy(reg2)

Save Results

Depending on what you want to do, you could simply post-process the existing tables, creating a simple dataset that is printed. Here, we save the individual results as datasets. If you were to split this document into three parts – Data generation, Analysis, and Tables – then this would be the last part of your analysis part.

# Save regression results as RDS files
saveRDS(reg1_results, "regression1_results.rds")
saveRDS(reg2_results, "regression2_results.rds")

Tables

Formatting Results via Excel

Here, however, we want to simulate what might be a far more complex table.

# Read data back in, as if we were a separate program

reg1_results <- readRDS("regression1_results.rds")
reg2_results <- readRDS("regression2_results.rds")

I use a function here, in order to handle the case when a file exists, and when one doesn’t. The very first time you would use this, no Excel file exists, but subsequent runs would overwrite the tabs within the existing file.

# Function to write to Excel, creating file if it doesn't exist
write_to_excel <- function(results_list, filename) {
  # Check if file exists
  if (!file.exists(filename)) {
    # Create new workbook if file doesn't exist
    message("Creating new file")
    wb <- createWorkbook()
  } else {
    # Load existing workbook if file exists
    message("Re-using existing file")
    wb <- loadWorkbook(filename)
  }
  
  # Loop through the results list and write/overwrite sheets
  for (sheet_name in names(results_list)) {
    # Remove sheet if it already exists
    if (sheet_name %in% names(wb)) {
      removeWorksheet(wb, sheet_name)
    }
    # Add new worksheet
    addWorksheet(wb, sheet_name)
    # Write data to worksheet
    writeData(wb, sheet_name, results_list[[sheet_name]])
  }
  
  # Save workbook
  saveWorkbook(wb, filename, overwrite = TRUE)
}

# Create list of results
regression_results <- list(
  "Regression1" = reg1_results,
  "Regression2" = reg2_results
)

# Write to Excel file
write_to_excel(regression_results, "regression_results.xlsx")

Display Results Here

Show the regression results in the document.

# Display results from first regression

knitr::kable(reg1_results)
term estimate std.error statistic p.value
(Intercept) 0.1350654 0.0961401 1.404882 0.1632508
x1 1.8668285 0.1048695 17.801445 0.0000000
x2 0.5238113 0.0989947 5.291307 0.0000008
# Display results from second regression

knitr::kable(reg2_results)
term estimate std.error statistic p.value
(Intercept) -0.1563012 0.1048648 -1.490501 0.1393709
z1 1.4305458 0.1009734 14.167548 0.0000000
z2 -0.7925999 0.1079181 -7.344456 0.0000000
z3 0.2206171 0.1141786 1.932210 0.0562816

Directory Contents

Show files created in the working directory.

# List all files in current working directory
list.files(pattern = "(xlsx|rds)$")
[1] "regression_results.xlsx"          "regression_results_template.xlsx"
[3] "regression1_results.rds"          "regression2_results.rds"         

Housekeeping

sessionInfo()
R version 4.4.2 (2024-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 26100)

Matrix products: default


locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

time zone: America/Toronto
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
 [1] broom_1.0.7      openxlsx_4.2.7.1 lubridate_1.9.4  forcats_1.0.0   
 [5] stringr_1.5.1    dplyr_1.1.4      purrr_1.0.2      readr_2.1.5     
 [9] tidyr_1.3.1      tibble_3.2.1     ggplot2_3.5.1    tidyverse_2.0.0 

loaded via a namespace (and not attached):
 [1] gtable_0.3.6      jsonlite_1.8.9    compiler_4.4.2    renv_1.0.11      
 [5] zip_2.3.1         Rcpp_1.0.13-1     tidyselect_1.2.1  scales_1.3.0     
 [9] yaml_2.3.10       fastmap_1.2.0     R6_2.5.1          generics_0.1.3   
[13] knitr_1.49        backports_1.5.0   munsell_0.5.1     pillar_1.9.0     
[17] tzdb_0.4.0        rlang_1.1.4       utf8_1.2.4        stringi_1.8.4    
[21] xfun_0.49         timechange_0.3.0  cli_3.6.3         withr_3.0.2      
[25] magrittr_2.0.3    digest_0.6.37     grid_4.4.2        rstudioapi_0.17.1
[29] hms_1.1.3         lifecycle_1.0.4   vctrs_0.6.5       evaluate_1.0.1   
[33] glue_1.8.0        fansi_1.0.6       colorspace_2.1-1  rmarkdown_2.29   
[37] tools_4.4.2       pkgconfig_2.0.3   htmltools_0.5.8.1
  1. The same basic toolkit can be used to construct the figure of your choice, but I’ll concentrate on tables here. 

  2. Jann, Ben (2005): Making regression tables from stored estimates. The Stata Journal 5(3): 288-308 and Jann, Ben (2007): Making regression tables simplified. The Stata Journal 7(2): 227-244. 

  3. Roy Wada, 2005. “OUTREG2: Stata module to arrange regression outputs into an illustrative table,” Statistical Software Components S456416, Boston College Department of Economics, revised 17 Aug 2014. 

  4. Julian Reif, 2008. “REGSAVE: Stata module to save regression results to a Stata-formatted dataset,” Statistical Software Components S456964, Boston College Department of Economics, revised 03 Dec 2023. 

  5. Hlavac M (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables. Social Policy Institute, Bratislava, Slovakia. R package version 5.2.3, https://CRAN.R-project.org/package=stargazer

  6. Arel-Bundock V (2022). “modelsummary: Data and Model Summaries in R.” Journal of Statistical Software, 103(1), 1–23. https://doi.org/10.18637/jss.v103.i01