Preparing your code for computational verification

Published: December 04, 2025

The steps in this document are being used in a pilot project.

This document describes how to prepare your code for verification, taking into account some of the most frequent issues that the Data Editor and his team have encountered in submitted replication packages.

⚠️❗ IMPORTANT: At this point, you should only be seeing this page if you were asked by the Data Editor team to do so, and if your replication package relies on a single software. Admissible containers are listed in the Step 5 section: authorized containers. We are not currently attempting to generalize this to multi-software replication packages, though it is possible to do so.

Overview

We will describe a few checks and edits you should make to your code, in order to ensure maximum reproducibility. We then describe how to test for reproducibility before submitting to the Data Editor. All steps have been tested by undergraduate replicators, and should be easy to implement. Whether they take a lot of time depends on your specific code, but generally, these adjustments can be made by somebody with good knowledge of the code base very quickly.

Much more extensive guidance on the issues addressed here is available at https://larsvilhuber.github.io/self-checking-reproducibility/. We reference specific chapters there at each of the steps.

⚠️❗ IMPORTANT: All but the last steps can be done by anybody, no special system requirements required, and independent of your ability to share confidential data. However, the last step may not be possible in an institution that does not allow you to install container software (Docker, OrbStack, etc.), and does not have such technology installed on a Linux cluster. We provide a public website where you can leverage containers for verification, but you should not use it for confidential data. In that case, please do all the other steps.

Checklist

Print off (as PDF or on paper) the following checklist, and tick off each item as you complete it. Provide the completed checklist as part of the replication package.

Detailed instructions

Preliminary: Directory structure of a replication package

A generic replication package, housed at /my/computer/users/me/project, might have the following structure:

README.pdf
data/
   raw/
      cps0001.dat
   analysis/
      combined_data.dta
      combined_data.csv
      combined_data_codebook.pdf
code/
  01_readcps.do
  02_readfred.do
  03_table1-5.do
  04_figures1-4.do
results/
  table1.tex
  table2.xlsx
  ...
  figure1.png
  figure2.pdf

where

data/raw has the externally acquired raw data files (not modified by the authors)
data/analysis has the processed data files, generated by the code in this repository.
code has the code files.
results has all the results files.

For illustration purposes, we have used Stata .do files, and outputs in a variety of formats, but the same principles apply to other software, and to any output formats.

Note that we did not specify where the main.do file will be!

Short-cut

If you want to include the key code pieces for Stata that are needed to comply with Steps 1-3, you can use this code fragment. Note that you do not HAVE to use this specific code, if your code already has equivalent features!

Step 1: Main file

You may or may not have a main file. The following should be adapted to your circumstances. You do not need to create a file that is called main.do if you already have one, but you may need to update your existing main file.

Reference: https://larsvilhuber.github.io/self-checking-reproducibility/02-hands_off_running.html

Creating a single main file is straightforward. However, you will want to make some minor edits depending on where, in the above template setup, the file is located:

Scenario A: `main` is in the `code` directory

The most frequent scenario we see (which we call Scenario A) amongst economists is that the main file is in the code directory:

README.pdf
data/
...
code/
  main.do
  01_readcps.do
  02_readfred.do
...

In this case, the following generic main file will work, with scenario set to "A".

local scenario "A"          // Scenario A: main is in code directory
local pwd : pwd                     // This always captures the current directory

if "`scenario'" == "A" {             // If in Scenario A, we need to change directory first
    cd ..
}
global rootdir : pwd                // Now capture the directory to use as rootdir
display in red "Rootdir has been set to: $rootdir"
cd "`pwd'"                            // Return to where we were before and never again use cd

// Now run the rest of the code
do "$rootdir/code/01_readcps.do"
do "$rootdir/code/02_readfred.do"
do "$rootdir/code/03_table1-5.do"
do "$rootdir/code/04_figures1-4.do"

Scenario B: `main` is in the top-level directory

More common in other computational sciences, but also present amongst economists, is that the main file is in the top-level directory:

README.pdf
main.do
data/
...
code/
  01_readcps.do
  02_readfred.do
...

In this case, the following generic main file will work, with scenario set to "B"(though see Step 3 Dependencies)

local scenario "B"          // Scenario B: main is in project top-level directory
local pwd : pwd                     // This always captures the current directory

if "`scenario'" == "A" {             // If in Scenario A, we need to change directory first
    cd ..
}
global rootdir : pwd                // Now capture the directory to use as rootdir
display in red "Rootdir has been set to: $rootdir"
cd "`pwd'"                            // Return to where we were before and never again use cd

// Now run the rest of the code
do "$rootdir/code/01_readcps.do"
do "$rootdir/code/02_readfred.do"
do "$rootdir/code/03_table1-5.do"
do "$rootdir/code/04_figures1-4.do"

Important

In neither scenario did we hard-code the path to our project directory /my/computer/users/me/project. This is not an omission, and it is important, because it allows the code to be run on any computer, without modification.

Finally, you should not hard-code your rootdir. Set the project root directory dynamically:

global rootdir : pwd   

# if using the here package:
rootdir <- here::here()
# or the rprojroot package
rootdir <- rprojroot::find_root_file("README.pdf")  # or other marker file

IMPORTANT: your code MUST contain the line (Stata) global rootdir : pwd (or equivalent) to set the project root directory dynamically.

Step 2: Path names and case

Two issues:

Windows computers use \ (backslashes) in path names, while Mac and Linux computers use / (forward slashes). The use of \ (backslashes) in path names breaks code on Mac and Linux computers.
Windows and Mac computers use case-insensitive file systems, while Linux computers use case-sensitive file systems.

Both of these issues need to be addressed. You are helped by a straightforward but often forgotten (or unknown) observation:

Every statistical programming language can use generic path names using / (forward slashes). This ensures wide reproducibility.

About 40% of replication packages in economics appear to be submitted by researchers using computers running MacOS or Linux. With a bit of simplified math, if we believe that is representative of what future replicators will do, that means that 40% of users will not be able to run 60% of replication packages without some potentially widespread edits, because of those backslashes.

You should thus replace all path names in your code to use / (forward slashes), or appropriate functions, and take care to write case-sensitive file and path names. This is straightforward:

Stata

// Instead of
use "data\analysis\combined_data.dta", clear
// Use
use "data/analysis/combined_data.dta", clear
// or better
use "$rootdir/data/analysis/combined_data.dta", clear

R

# Instead of
data <- read.csv("data\\analysis\\combined_data.csv")
# Use
data <- read.csv("data/analysis/combined_data.csv")
# or better
data <- read.csv(file.path(rootdir, "data", "analysis", "combined_data.csv"))

and similarly for other languages.

Implementing

In many cases, you can just globally replace all \ with / in your code files. Caution however is warranted if your code explicitly writes out $LaTeX$ code, which also (legitimately) uses \. In that case, you will need to be more careful.

Expert tip

If using a (Bash or Zsh) terminal, you likely have the sed command available. You can use it to replace all backslashes with forward slashes in all .do files in the code directory as follows:

sed -i 's+\\+/+g' code/*.do

Step 3: Dependencies

Stata packages

Stata users frequently use user-written packages, which are made available to the Stata community via the Stata Journal, SSC, or Github. They are typically installed using a small number of variants of the net install command (including ssc install).

Replicators need to have the same versions of these packages installed. Stata does not (currently) provide a way to install older versions of packages, and a regular occurrence of reproducibility failure is due to changes in packages over time. We have some simple solutions to this problem.

First, use an environment to permanently install-project specific packages once and for all.

Define the environment in your main file, after setting $rootdir:

Reference: https://larsvilhuber.github.io/self-checking-reproducibility/12-environments-in-stata.html and https://github.com/AEADataEditor/replication-template/blob/master/template-config.do#L129.

/* install any packages locally */
di "=== Redirecting where Stata searches for ado files ==="
capture mkdir "$rootdir/ado"
adopath - PERSONAL
adopath - OLDPLACE
adopath - SITE
sysdir set PLUS     "$rootdir/ado/plus"
sysdir set PERSONAL "$rootdir/ado"       // may be needed for some packages
sysdir

From this point on, all installed packages will be installed into $rootdir/ado, and Stata will look there first when loading packages.

Install packages once if not present, but don’t reinstall if already present.

Reference: https://gist.github.com/larsvilhuber/d8b643a408d425ef2a80385b6377870d#file-part2_of_main-do-L14, though you should be able to just use your own install code as well, if it worked before.

*** Add required packages from SSC to this list ***
local ssc_packages ""
    // Example:
    // local ssc_packages "estout boottest"
    // 
    display in red "============ Installing packages/commands from SSC ============="
    display in red "== Packages: `ssc_packages'"
    if !missing("`ssc_packages'") {
        foreach pkg in `ssc_packages' {
            capture which `pkg'
            if _rc == 111 {                 
               dis "Installing `pkg'"
                ssc install `pkg'
            }
            which `pkg'
        }
    }
ado

Some special cases (usually not necessary)

For some packages, the package name is not the same thing as the command name. Example: moremata. For these packages, the above code does not work. Use this code:¹

Reference: https://gist.github.com/larsvilhuber/d8b643a408d425ef2a80385b6377870d#file-part2_of_main-do-L27

    // If you have packages that need to be unconditionally installed (the name of the package differs from the included commands), then list them here.
    // examples are moremata, egennmore, blindschemes, etc.
local ssc_unconditional ""
/* add unconditionally installed packages */
    display in red "=============== Unconditionally installed packages from SSC ==============="
    display in red "== Packages: `ssc_unconditional'"
    if !missing("`ssc_unconditional'") {
        foreach pkg in `ssc_unconditional' {
            dis "Installing `pkg'"
            cap ssc install `pkg'
        }
    }
ado

Packages that are not on SSC may need to be net installed from other sources, including Github and personal websites. Again, this does not neatly work with a specific command check, and thus you may need to unconditionally install them. Use this code:

    // If you have packages that need to be unconditionally installed from other sources (not SSC), then list them here.
    // Example: grc1leg
  net install grc1leg, from("http://www.stata.com/users/vwiggins/")
    // Example when net install is not an option 
  cap mkdir "$rootdir/ado/plus/e"
  cap copy http://www.sacarny.com/wp-content/uploads/2015/08/ebayes.ado "$rootdir/ado/plus/e/ebayes.ado"
ado

Adding to replication package

The following files should be included in your replication package:

code/ado/*

R packages

For R packages, we suggest that users use renv, and do not set a specific CRAN mirror. We refer users to the renv documentation for details, but in a nutshell, for an existing R project that is not using renv, the following commands should be run in the R console:

install.packages("renv")  # only once
renv::init()               # only once per project
renv::snapshot()           # only once per project, after all packages are installed. You should choose to install all packages detected, then snapshotting.
renv::status()             # to check status

This will create a file renv.lock in the top-level directory of your project.

Adding to replication package

The following files should be included in your replication package:

.Rprofile
renv.lock
renv/activate.R
renv/settings.json

Do not include the entire renv directory, in particular not the renv/library subdirectory, as it is platform-specific (of no use to other platforms), and can be very large.

Step 4: Displays

Displays (figures and tables) should be written out to external files, and the authors’ versions, as used in the manuscript, should be provided. In the prototypical replication package structure above, these files would be in the results directory.

Reference: https://larsvilhuber.github.io/self-checking-reproducibility/03-automatically_saving_figures.html and https://github.com/labordynamicsinstitute/replicability-training/wiki/How-to-output-tables-and-figures

Figures

All figures can be written out to files. Journals like pdf and eps files, but png are convenient. You can output multiple formats.
Whenever you have displayed a figure, also exportit to a file. It’s a simple command.

Stata

// Example for PNG
graph export "$rootdir/results/figure1.png", replace width(1200) height(800) 
// Example for PDF
graph export "$rootdir/results/figure1.pdf", replace

# Example for PNG if using standard R
png(filename = file.path(rootdir, "results", "figure1.png"), width = 1200, height = 800)
plot(x, y)  # your plotting code here
dev.off() 
# Example if using ggplot2
ggsave(filename = file.path(rootdir, "results", "figure1.png"), plot = myplot, width = 12, height = 8, units = "in", dpi = 100)

More complex figures

For more complex figures, it may be easier to simply write out the data underlying the figure to an Excel sheet, and create the figure there. See https://github.com/labordynamicsinstitute/replicability-training/wiki/How-to-output-tables-and-figures#arbitrary-data-to-excel on how to write out the underlying data. You would then include the Excel file that maps the data into a figure with your replication package.

Tables

Tables may be more complex. Simple tables can be written out using various tools:

Stata

esttab or outreg2, also putexcel. For fancier stuff, treat tables as data, use regsave or export excel to manipulate.

xtable, stargazer, others.

More complex tables

For more complex tables, it may be easier to simply write out entire matrices, or individual numbers, to an Excel sheet, and compose the table there. See https://github.com/labordynamicsinstitute/replicability-training/wiki/How-to-output-tables-and-figures#examples for an example, especially if you have already been compiling your tables in Excel. You would then include the Excel file that maps the data into your preferred table layout with your replication package.

Step 5: Testing in containers

After you have made all the above changes, you should test your code in an appropriate authorized container.

⚠️❗ IMPORTANT: If you do not have Docker installed on your computer, do not have the rights to install Docker on your computer, or do not have access otherwise to Docker, please do not attempt this, and skip straight to the alternative approach.

Reference: https://larsvilhuber.github.io/self-checking-reproducibility/80-docker.html

Authorized containers

The following list of containers are authorized for testing, as they are reliably available, and achieve the desired transparency.

Stata containers for versions 19now back to 11, provided by the Social Science Data Editors at https://hub.docker.com/u/dataeditors, such as dataeditors/stata18_5-mp:2025-02-26. (requires a license)
R containers provided by the Rocker Project, such as rocker/r-ver:4.3.1 or rocker/tidyverse:4.3.1 (which includes the tidyverse packages).
MATLAB + Dynare containers provided by the Dynare Project at https://hub.docker.com/r/dynare/dynare, such as dynare/dynare:5.3-2024-05-21. See the project page for the mapping betweeen containers and MATLAB versions. (requires a MATLAB license)

If you know of a different container that we should add to this list, please let us know. The AEA Data Editor’s Github profile has a few other containers that have worked, but may be too advanced for the typical user.

⚠️❗ IMPORTANT: Do not provide us with a custom container that is not on the above list. Transparency requires that the container be built, using a Dockerfile or apptainer.def file, from publicly available sources. While we will happily use your container, it must be built from one of the above sources, or well-known “standard” sources, such as “Docker Official Images” in the Dockerhub library space (e.g., https://hub.docker.com/_/python).

Steps

Install the software necessary for running containers.
- For Windows, install Docker Desktop for Windows.
- For Mac, install Docker Desktop for Mac or OrbStack.
- For Linux, install Docker engine, Podman, or use Apptainer. These can all also be installed on Windows under Windows Subsystem for Linux (WSL).
All example commands below are from a Bash or Zsh terminal, which are standard on Mac and Linux, as well as on Windows if using WSL. If you do not have WSL on Windows and are using the Powershell, the same principles apply, but the syntax may be different.

When code has been adjusted as in Steps 1-4, no complex adjustment of containers is necessary.

Run the container, mounting your project directory into the container. For example, if your project is in /my/computer/users/me/project, you would use a command such as this (example for Stata):

Preliminaries

(may need some adjustment, depending on your license)

VERSION=18_5
TAG=2025-02-26
MYHUBID=dataeditors
MYIMG=stata${VERSION}
CONTAINER=$MYHUBID/${MYIMG}-${TYPE}:${TAG} 
TYPE=mp
STATALIC=/path/to/your/stata/stata.lic

Explanations:

VERSION: This is the Stata version. StataNow is referenced with a _5 suffix, otherwise, this corresponds to your (major) Stata version number.
TAG: This is the date the container was built, in YYYY-MM-DD format. Recent Stata containers do not (on purpose) have a latest tag, but older ones (that are no longer maintained) do, and can replace the date with latest.
CONTAINER: is the fully qualified name of the container to be used. It is built from various components. For Stata images, these are maintained by dataeditors on Dockerhub. All available Stata containers and tags can be viewed on https://hub.docker.com/u/dataeditors. The precise way to call the container may depend on the version. For instance, for versions prior to 18, the -${TYPE} suffix is not used.
STATALIC: Is the path (in the notation used by the terminal you are using) to your Stata license file stata.lic. You need to have a valid Stata license file for the version of Stata you are using.

If you have only an older license, or a non-MP license, you may need to replace VERSION, TAG, and TYPE accordingly. For instance, if you have a Stata 16 SE license, you would set VERSION=16, TAG=2023-06-13, and TYPE=se, and remove -${TYPE} from the CONTAINER definition.

Test the container

docker run -it --rm \
  --volume ${STATALIC}:/usr/local/stata/stata.lic \
  --entrypoint stata-${TYPE} \
  ${CONTAINER}

You should see the usual Stata prompt. Type exit to leave Stata.

Run the container

docker run -it --rm \
  --volume ${STATALIC}:/usr/local/stata/stata.lic \
  --volume $(pwd):/project \
  --workdir /project \
  --entrypoint stata-${TYPE} \
  ${CONTAINER} -b main.do

if using a Scenario B setup. If using a Scenario A setup, use

docker run -it --rm \
  --volume ${STATALIC}:/usr/local/stata/stata.lic \
  --volume $(pwd):/project \
  --workdir /project/code \
  --entrypoint stata-${TYPE} \
  ${CONTAINER} -b main.do

Alternative approach

If you cannot run Docker on your computer, we make available the SIVACOR service, which allows you to run your code using authorized containers without the need to install software on your own computer, producing a Trusted Research Object (TRO).

In fact, we will run your code using this same system to verify compliance with all of the above steps!

For more information on how to use SIVACOR, see https://docs.sivacor.org/. Once you have successfully run your code on SIVACOR, provide the generated certified ZIP file instead of the original replication package to the Data Editor. A TRO does not need to be re-run by the Data Editor.

Fallback: Run on a different computer

If you do not have, or cannot, install Docker, and you cannot use SIVACOR, use this alternative approach to test your code:

Download your entire replication package from the draft openICPSR deposit, onto a different computer where you have not previously run the code.
Run the code from that new location.
- For Stata, close all Stata windows, and then double-click on the main.do file. This should generate a main.log file in the same directory as main.do.
  - For R, from a terminal or the RStudio Terminal tab, type R CMD BATCH main.R, or if using renv, R --no-save --no-restore -f main.R > main.Rout.² This should generate a main.Rout file in the same directory as main.R.

We note that in our experience, this approach is much less reliable.

Success

If your code does run into problems, the generated main.log or main.Rout should have clues as to what went wrong. You should be able to fix these issues, and re-run the code in the container, until it runs without error.

If your code runs without error, and produces all expected output files, you are done!

Finalize README

Reference: https://social-science-data-editors.github.io/template_README/template-README.html

This step is usually not necessary, but you want to just make sure that your README has the necessary information that help set expectations about computational feasibility, based on the steps above.

Software: If you used a container, specify which container you used (name and tag, e.g., dataeditors/stata18_5-mp:2025-02-26). Be precise when describing the StataMP version - the number of cores matters! (StataMP-4 may not behave the same way as StataMP-8.)
Hardware: Verify that the description of your computer (CPU, number of cores, RAM, disk space) is accurate.
Run time: Provide an estimate of the expected run time, however trivial it might be. It matters to the replicator!

Examples:

- OS: "openSUSE Leap 15.6"
- Processor:  13th Gen Intel(R) Core(TM) i7-1365U, 12 cores
- Memory available: 30GB memory
- Docker version 28.4.0-ce, build 249d679a6 
- stata version 18-mp-i (Docker image dataeditors/stata18-mp-i:2024-12-18) (born date: "18 Dec 2024") with 32 core license

Code ran for about 35 hours.

- OS: Windows Server AMD EPYC 7763 64-Core Processor 2.44 GHz, 128GB
- Stata/MP4 19.5 ("21 May 2025")
- MATLAB R2025a

Code runs about 10 minutes for Stata portion, and about 5 days for MATLAB portion.

Submitting

You can now submit your replication package to the Data Editor, along with the completed checklist from above, and the generated main.log/main.Rout as evidence.

Problems?

If you run into problems in Step 5, no worries, simply submit all the files as modified in Steps 1-4, along with the completed checklist, and we will handle the remaining issues.

A more customized setup might check for a package-specific file in the ado directory, such as the <package>.pkg, but this is more complex and may not always work. ↩
In PowerShell, you can use R --no-save --no-restore -f main.R | Out-File -Encoding UTF8 main.Rout. ↩

Share on

Twitter LinkedIn

AEA Data Editor