# AEA Replication Package Preparation: Agent Instructions

This document contains official guidance from the AEA Data Editor for preparing
an economics replication package for submission to the American Economic
Association. It is auto-generated for AI agent use from the source documents at
https://aeadataeditor.github.io/aea-de-guidance/preparing-replication-package

## How to use these instructions

You are an AI assistant helping an economics researcher prepare their replication
package for submission to the AEA Data Editor. Work through each step in order:

1. Review the replication package against each checklist item.
2. Identify issues or missing elements.
3. Suggest specific, actionable fixes — include code examples where relevant.
4. Confirm each step is satisfied before moving to the next.

**Note on links:** Bare link targets such as `(preparing-replication-package-step1)`
are relative references to pages at https://aeadataeditor.github.io/aea-de-guidance/. The full content of all
referenced pages is already included in this document, so you do not need to
follow those links.


---


> The steps in this document are being used in a pilot project. 

This document describes how to prepare your code for verification, taking into account some of the most frequent issues that the Data Editor and his team have encountered in submitted replication packages.


> ⚠️❗ **IMPORTANT:** At this point, you should only be seeing this page if you were asked by the Data Editor team to do so, and if your replication package relies on a single software. Admissible containers are listed in the [Step 5 section: authorized containers](#authorized-containers). We are not currently attempting to generalize this to multi-software replication packages, though [it](https://github.com/AEADataEditor/docker-r-gurobi) [is](https://github.com/AEADataEditor/docker-aer-2022-0276) [possible](https://github.com/AEADataEditor/docker-aer-2023-0505) [to do so](https://github.com/AEADataEditor/docker-aer-2023-0700).

## Overview

We will describe a few checks and edits you should make to your code, in order to ensure maximum reproducibility. We then describe how to test for reproducibility before submitting to the Data Editor. All steps have been tested by undergraduate replicators, and should be easy to implement. Whether they take a lot of time depends on your specific code, but generally, these adjustments can be made by somebody with good knowledge of the code base very quickly.

Much more extensive guidance on the issues addressed here is available at <https://larsvilhuber.github.io/self-checking-reproducibility/>. We reference specific chapters there at each of the steps.

> ⚠️❗ **IMPORTANT:** All but the last steps can be done by anybody, no special system requirements required, and independent of your ability to share confidential data. However, the last step may not be possible in an institution that does not allow you to install container software (Docker, OrbStack, etc.), and does not have such technology installed on a Linux cluster. We provide a public website where you can leverage containers for verification, but you should not use it for confidential data. In that case, please do all the other steps. 

<div style="page-break-after: always;"></div>

## Using an AI assistant

You can use an AI assistant (such as Claude, ChatGPT, or GitHub Copilot) to guide you through this checklist. Give your AI the following instruction:

> Use <https://aeadataeditor.github.io/aea-de-guidance/preparing-replication-package.agent.md> to review my replication package and help me prepare it for submission to the AEA Data Editor.

The AI will work through each step with you, identify issues, and suggest specific fixes.

## Checklist

Print off (as PDF or on paper) the following checklist, and tick off each item as you complete it. Provide the completed checklist as part of the replication package.

- [ ] [**Step 1: Main file**](preparing-replication-package-step1): A single main file is provided that runs all code.  [Details](preparing-replication-package-step1)
- [ ] [**Step 2: Path names**](preparing-replication-package-step2): All paths in code use `/` (forward slashes) relative to a single top-level project directory (`$rootdir`, `$basedir`, etc.). The top-level project directory is set dynamically, not hard-coded (explanations below).  [Details](preparing-replication-package-step2)
- [ ] [**Step 3: Dependencies**](preparing-replication-package-step3): All packages/libraries/dependencies are installed via code once.  [Details](preparing-replication-package-step3)
  - [ ] For Stata, these packages are installed into a subdirectory in the project (`$rootdir/ado`, `$basedir/adofiles`, etc.), and used by the code.
  - [ ] For R, `renv` is used (exceptions made for other package management systems if such a system is explained).
  - [ ] For Python, environments are used (native `venv` or `conda`), and the necessary top-level requirements specified (no OS-specific dependencies are included).
- [ ] [**Step 4: Displays**](preparing-replication-package-step4): All figures and tables are written out to clearly identified external files, and the authors' versions, as used in the manuscript, are provided.  [Details](preparing-replication-package-step4)
- [ ] [**Step 5: Testing on AEA-maintained website**](preparing-replication-package-step5): After all changes were made, the code was run  using the referenced website, a certified ZIP file was created, and is provided instead of the original replication package (alternatives exist for certain situations).  [Details](preparing-replication-package-step5)
- [ ] (usually not necessary) [**Finalize**](preparing-replication-package-finalize): Update the README with the necessary information about computer specifications, Docker image used, memory and disk space requirements, and expected runtime. 



## Submitting

You can now submit your replication package to the Data Editor, along with the completed checklist from above, and the generated `main.log`/`main.Rout` as evidence.

## Problems?

If you run into problems at any step, please reach out. If you only run into problems in Step 5, no worries, simply submit all the files as modified in Steps 1-4, along with the completed checklist, and we will handle the remaining issues.


---



> You may or may not have a main file. The following should be adapted to your circumstances. You do not need to create a file that is called `main.do` if you already have one, but you may need to update your existing main file.

> Reference: <https://larsvilhuber.github.io/self-checking-reproducibility/02-hands_off_running.html>

Creating a single main file is straightforward. However, you will want to make some minor edits depending on where, in the above template setup, the file is located:

## Scenario A: `main` is in the `code` directory

The most frequent scenario we see (which we call **Scenario A**) amongst economists is that the main file is in the `code` directory:

```
README.pdf
data/
...
code/
  main.do
  01_readcps.do
  02_readfred.do
...
```

In this case, the following generic main file will work, with `scenario` set to `"A"`.

```stata
// Stata example
local scenario "A"          // Scenario A: main is in code directory
local pwd : pwd                     // This always captures the current directory

if "`scenario'" == "A" {             // If in Scenario A, we need to change directory first
    cd ..
}
global rootdir : pwd                // Now capture the directory to use as rootdir
display in red "Rootdir has been set to: $rootdir"
cd "`pwd'"                            // Return to where we were before and never again use cd

// Now run the rest of the code
do "$rootdir/code/01_readcps.do"
do "$rootdir/code/02_readfred.do"
do "$rootdir/code/03_table1-5.do"
do "$rootdir/code/04_figures1-4.do"
```

## Scenario B: `main` is in the top-level directory

More common in other computational sciences, but also present amongst economists, is that the main file is in the top-level directory:


```
README.pdf
main.do
data/
...
code/
  01_readcps.do
  02_readfred.do
...
```


In this case, the following generic main file will work, with `scenario` set to `"B"`(though see [Step 3 Dependencies](preparing-replication-package-step3))

```stata
// Stata example
local scenario "B"          // Scenario B: main is in project top-level directory
local pwd : pwd                     // This always captures the current directory

if "`scenario'" == "A" {             // If in Scenario A, we need to change directory first
    cd ..
}
global rootdir : pwd                // Now capture the directory to use as rootdir
display in red "Rootdir has been set to: $rootdir"
cd "`pwd'"                            // Return to where we were before and never again use cd

// Now run the rest of the code
do "$rootdir/code/01_readcps.do"
do "$rootdir/code/02_readfred.do"
do "$rootdir/code/03_table1-5.do"
do "$rootdir/code/04_figures1-4.do"
```

## Important

> In neither scenario did we hard-code the path to our project directory `/my/computer/users/me/project`. This is not an omission, and it is important, because it allows the code to be run on any computer, without modification.

Finally, you should not hard-code your `rootdir`. Set the **project root directory dynamically**:

```stata
// Stata:
global rootdir : pwd
// Example
datadir   = "$rootdir/data/raw"
outputdir = "$rootdir/data/clean"
```

```r
# R:
# if using the here package:
rootdir <- here::here()
# or the rprojroot package
rootdir <- rprojroot::find_root_file("README.pdf")  # or other marker file
# Example
datadir   = file.path(rootdir, "data", "raw")
outputdir = file.path(rootdir, "data", "clean")
```

```python
# Python:
import os
from pathlib import Path

# Set directories
code_dir = Path(__file__).resolve().parent
rootdir = code_dir.parent
# Example
datadir   = rootdir / "data" / "raw"
outputdir = rootdir / "data" / "clean"
```

> IMPORTANT: your code MUST contain the line (Stata) `global rootdir : pwd` (or equivalent) to set the project root directory dynamically.

## Creating directories programmatically

If your code uses directories that may start out empty, or may not exist on the replicators' computers, you must create them programmatically. 


```stata
// Stata:
cap mkdir "$outputdir"
```

```r
# R:
dir.create(outputdir, showWarnings = FALSE, recursive = TRUE)
```

```python
# Python:
outputdir.mkdir(parents=True, exist_ok=True)
```


---



Two issues:

- Windows computers use `\` (backslashes) in path names, while Mac and Linux computers use `/` (forward slashes). The use of `\` (backslashes) in path names breaks code on Mac and Linux computers.

- Windows and Mac computers use case-insensitive file systems, while Linux computers use case-sensitive file systems.

Both of these issues need to be addressed. You are helped by a straightforward but often forgotten (or unknown) observation:


- **Every statistical programming language can use generic path names using `/` (forward slashes).** This ensures wide reproducibility.

About 40% of replication packages in economics appear to be submitted by researchers using computers running MacOS or Linux. With a bit of simplified math, if we believe that is representative of what future replicators will do, that means that 40% of users will not be able to run 60% of replication packages without some potentially widespread edits, because of those backslashes.

You should thus **replace all path names in your code to use `/` (forward slashes)**, or appropriate functions, and take care to write **case-sensitive file and path names**. This is straightforward:

## Stata

```stata
// Instead of
use "data\analysis\combined_data.dta", clear
// Use
use "data/analysis/combined_data.dta", clear
// or better
use "$rootdir/data/analysis/combined_data.dta", clear
```

## R

```r
# Instead of
data <- read.csv("data\\analysis\\combined_data.csv")
# Use
data <- read.csv("data/analysis/combined_data.csv")
# or better
data <- read.csv(file.path(rootdir, "data", "analysis", "combined_data.csv"))
```

and similarly for other languages.


## Implementing

In many cases, you can just globally replace all `\` with `/` in your code files. Caution however is warranted if your code explicitly writes out $LaTeX$ code, which also (legitimately) uses `\`. In that case, you will need to be more careful.

## Expert tip

If using a (Bash or Zsh) terminal, you likely have the `sed` command available. You can use it to replace all backslashes with forward slashes in all `.do` files in the `code` directory as follows:

```bash
sed -i 's+\\+/+g' code/*.do
```


---



## Stata packages

Stata users frequently use user-written packages, which are made available to the Stata community via the [Stata Journal](https://www.stata-journal.com/), [SSC](https://ideas.repec.org/s/boc/bocode.html), or Github. They are typically installed using a small number of variants of the `net install` command (including `ssc install`).

Replicators need to have the same versions of these packages installed. Stata does not (currently) provide a way to install older versions of packages, and a regular occurrence of reproducibility failure is due to changes in packages over time. We have some simple solutions to this problem.

First, use an environment to permanently install-project specific packages once and for all.


**Define the environment** in your main file, after setting `$rootdir`:


> Reference: <https://larsvilhuber.github.io/self-checking-reproducibility/12-environments-in-stata.html> and <https://github.com/AEADataEditor/replication-template/blob/master/template-config.do#L129>.


```stata
/* install any packages locally */
di "=== Redirecting where Stata searches for ado files ==="
capture mkdir "$rootdir/ado"
adopath - PERSONAL
adopath - OLDPLACE
adopath - SITE
sysdir set PLUS     "$rootdir/ado/plus"
sysdir set PERSONAL "$rootdir/ado"       // may be needed for some packages
sysdir
```

From this point on, all installed packages will be installed into `$rootdir/ado`, and Stata will look there first when loading packages.

**Install packages once** if not present, but don't reinstall if already present.


> Reference: <https://gist.github.com/larsvilhuber/d8b643a408d425ef2a80385b6377870d#file-part2_of_main-do-L14>, though you should be able to just use your own install code as well, if it worked before.

```stata
*** Add required packages from SSC to this list ***
local ssc_packages ""
    // Example:
    // local ssc_packages "estout boottest"
    //
    display in red "============ Installing packages/commands from SSC ============="
    display in red "== Packages: `ssc_packages'"
    if !missing("`ssc_packages'") {
        foreach pkg in `ssc_packages' {
            capture which `pkg'
            if _rc == 111 {
               dis "Installing `pkg'"
                ssc install `pkg'
            }
            which `pkg'
        }
    }
 ado
```

**Some special cases** (usually not necessary)

*For some packages, the package name is not the same thing as the command name.* Example: `moremata`. For these packages, the above code does not work. Use this code:[^unconditional-packages]

[^unconditional-packages]: A more customized setup might check for a package-specific file in the `ado` directory, such as the `<package>.pkg`, but this is more complex and may not always work.


> Reference: <https://gist.github.com/larsvilhuber/d8b643a408d425ef2a80385b6377870d#file-part2_of_main-do-L27>

```stata
    // If you have packages that need to be unconditionally installed (the name of the package differs from the included commands), then list them here.
    // examples are moremata, egennmore, blindschemes, etc.
local ssc_unconditional ""
/* add unconditionally installed packages */
    display in red "=============== Unconditionally installed packages from SSC ==============="
    display in red "== Packages: `ssc_unconditional'"
    if !missing("`ssc_unconditional'") {
        foreach pkg in `ssc_unconditional' {
            dis "Installing `pkg'"
            cap ssc install `pkg'
        }
    }
 ado
```

*Packages that are not on SSC may need to be `net install`ed from other sources,* including Github and personal websites. Again, this does not neatly work with a specific command check, and thus you may need to unconditionally install them. Use this code:


```stata
    // If you have packages that need to be unconditionally installed from other sources (not SSC), then list them here.
    // Example: grc1leg
  net install grc1leg, from("http://www.stata.com/users/vwiggins/")
    // Example when net install is not an option
  cap mkdir "$rootdir/ado/plus/e"
  cap copy http://www.sacarny.com/wp-content/uploads/2015/08/ebayes.ado "$rootdir/ado/plus/e/ebayes.ado"
 ado
```


**Indexing Mata libraries** (sometimes necessary, always useful)

When Stata packages include Mata libraries, the (separate) Mata index of such files needs to be updated. This is a very quick operation, and never hurts to include it, after all installs.

```stata
    mata: mata mlib index
```

***Adding to replication package***

The following files should be included in your replication package:

```bash
code/ado/*
```

## R packages

For R packages, we suggest that users use `renv`, and do not set a specific CRAN mirror. We refer users to the [renv documentation](https://rstudio.github.io/renv/articles/renv.html) for details, but in a nutshell, for an existing R project that is not using `renv`, the following commands should be run in the R console:

```r
install.packages("renv")  # only once
renv::init()               # only once per project
renv::snapshot()           # only once per project, after all packages are installed. You should choose to install all packages detected, then snapshotting.
renv::status()             # to check status
```

This will create a file `renv.lock` in the top-level directory of your project.

***Adding to replication package***

The following files should be included in your replication package:

```bash
.Rprofile
renv.lock
renv/activate.R
renv/settings.json
```

Do not include the entire `renv` directory, in particular not the `renv/library` subdirectory, as it is platform-specific (of no use to other platforms), and can be very large.


---



Displays (figures and tables) should be written out to external files, and the authors' versions, as used in the manuscript, should be provided. In the prototypical replication package structure above, these files would be in the `results` directory.

> Reference: <https://larsvilhuber.github.io/self-checking-reproducibility/03-automatically_saving_figures.html> and <https://github.com/labordynamicsinstitute/replicability-training/wiki/How-to-output-tables-and-figures>

## Figures

- All figures can be written out to files. Journals like `pdf` and `eps` files, but `png` are convenient. You can output multiple formats.
- Whenever you have displayed a figure, also `export`it to a file. It's a simple command.

### Stata

```stata
// Example for PNG
graph export "$rootdir/results/figure1.png", replace width(1200) height(800)
// Example for PDF
graph export "$rootdir/results/figure1.pdf", replace
```

### R

```r
# Example for PNG if using standard R
png(filename = file.path(rootdir, "results", "figure1.png"), width = 1200, height = 800)
plot(x, y)  # your plotting code here
dev.off()
# Example if using ggplot2
ggsave(filename = file.path(rootdir, "results", "figure1.png"), plot = myplot, width = 12, height = 8, units = "in", dpi = 100)
```

### More complex figures

For more complex figures, it may be easier to simply write out the data underlying the figure to an Excel sheet, and create the figure there. See <https://github.com/labordynamicsinstitute/replicability-training/wiki/How-to-output-tables-and-figures#arbitrary-data-to-excel>  on how to write out the underlying data. **You would then include the Excel file that maps the data into a figure with your replication package.**


## Tables

Tables may be more complex. Simple tables can be written out using various tools:

### Stata

`esttab` or `outreg2`, also `putexcel`. For fancier stuff, treat tables as data, use `regsave` or `export excel` to manipulate.

### R

`xtable`, `stargazer`, others.

### More complex tables

For more complex tables, it may be easier to simply write out entire matrices, or individual numbers, to an Excel sheet, and compose the table there. See <https://github.com/labordynamicsinstitute/replicability-training/wiki/How-to-output-tables-and-figures#examples> for an example, especially if you have already been compiling your tables in Excel. **You would then include the Excel file that maps the data into your preferred table layout with your replication package.**


---



After you have made all the above changes, you should test your code in an appropriate **authorized** container. To make this simple, we have set up a public website that hides the complexity of running containers from you. You only need to choose the software, the system will run the properly configured code automatically.

## Using the SIVACOR website

We have developed the [SIVACOR](https://sivacor.org) service, which allows you to run your code using authorized containers without the need to install software on your own computer, producing a Trusted Research Object (TRO).

> In fact, we will run your code using this same system to verify compliance with all of the above steps!


For more information on how to use SIVACOR, see <https://docs.sivacor.org/>. Once you have successfully run your code on SIVACOR, provide the generated certified ZIP file  instead of the original replication package to the Data Editor. A TRO does not need to be re-run by the Data Editor.


## Authorized containers

SIVACOR uses a curated list of containers, chosen because  they are reliably available, and achieve the desired transparency. You can inspect the most current list at <https://docs.sivacor.org/docs/images/>. In general, Stata, R, and MATLAB (with Dynare) are supported.

If you know of a different container that we should add to this list, please let us know. The [AEA Data Editor's Github profile](https://github.com/AEADataEditor/) has a few other containers that have worked..


## Testing using Docker locally (advanced)

If SIVACOR does not work for you, you can either attempt to run it in Docker on your own computer, or skip this step entirely and revert back to the standard (manual) verification process. Installing and running Docker on your computer is straightforward (undergraduate students in the AEA Data Editor team have done this in under half an hour), but may not meet everybody's needs.

> ⚠️❗ **IMPORTANT:** If you do not have Docker installed on your computer, do not have the rights to install Docker on your computer, or do not have access otherwise to Docker, please do not attempt this, and skip straight [to the alternative approach](#alternative-approach).

> ⚠️❗ **IMPORTANT:** Do not provide us with a custom container that is not  on the above list. Transparency requires that the container be built, using a `Dockerfile` or `apptainer.def` file, from publicly available sources. While we will happily use your container, it must be built from one of the above sources, or well-known "standard" sources, such as "Docker Official Images" in the Dockerhub `library` space (e.g., <https://hub.docker.com/_/python>).

### Steps

- Install the software necessary for running containers.
  - For Windows, install [Docker Desktop for Windows](https://docs.docker.com/desktop/install/windows-install/).
  - For Mac, install [Docker Desktop for Mac](https://docs.docker.com/desktop/install/mac-install/) or [OrbStack](https://orbstack.dev).
  - For Linux, install Docker engine,  [Podman](https://podman.io/getting-started/installation), or use [Apptainer](https://apptainer.org/). These can all also be installed on Windows under Windows Subsystem for Linux (WSL).
- All example commands below are from a Bash or Zsh terminal, which are standard on Mac and Linux, as well as on Windows if using WSL. If you do not have WSL on Windows and are using the Powershell, the same principles apply, but the syntax may be different.


> When code has been adjusted as in Steps 1-4, no complex adjustment of containers is necessary.

- Run the container, mounting your project directory into the container. For example, if your project is in `/my/computer/users/me/project`, you would use a command such as this (example for Stata):

### Preliminaries

(may need some adjustment, depending on your license)

```bash
VERSION=18_5
TAG=2025-02-26
MYHUBID=dataeditors
MYIMG=stata${VERSION}
CONTAINER=$MYHUBID/${MYIMG}-${TYPE}:${TAG}
TYPE=mp
STATALIC=/path/to/your/stata/stata.lic
```

Explanations:

- `VERSION`: This is the Stata version. StataNow is referenced with a `_5` suffix, otherwise, this corresponds to your (major) Stata version number.
- `TAG`: This is the date the container was built, in `YYYY-MM-DD` format. Recent Stata containers do not (on purpose) have a `latest` tag, but older ones (that are no longer maintained) do, and can replace the date with `latest`.
- `CONTAINER`:  is the fully qualified name of the container to be used. It is built from various components. For Stata images, these are maintained by `dataeditors` on Dockerhub. All available Stata containers and tags can be viewed on <https://hub.docker.com/u/dataeditors>. The precise way to call the container may depend on the version. For instance, for versions prior to `18`, the `-${TYPE}` suffix is not used.
- `STATALIC`: Is the path (in the notation used by the terminal you are using) to your Stata license file `stata.lic`. You need to have a valid Stata license file for the version of Stata you are using.

> If you have only an older license, or a non-MP license, you may need to replace `VERSION`, `TAG`, and `TYPE` accordingly. For instance, if you have a Stata 16 SE license, you would set `VERSION=16`, `TAG=2023-06-13`, and `TYPE=se`, and remove `-${TYPE}` from the `CONTAINER` definition.



### Test the container

```bash
docker run -it --rm \
  --volume ${STATALIC}:/usr/local/stata/stata.lic \
  --entrypoint stata-${TYPE} \
  ${CONTAINER}
```

You should see the usual Stata prompt. Type `exit` to leave Stata.

### Run the container

```bash
docker run -it --rm \
  --volume ${STATALIC}:/usr/local/stata/stata.lic \
  --volume $(pwd):/project \
  --workdir /project \
  --entrypoint stata-${TYPE} \
  ${CONTAINER} -b main.do
```

if using a **Scenario B** setup. If using a **Scenario A** setup, use

```bash
docker run -it --rm \
  --volume ${STATALIC}:/usr/local/stata/stata.lic \
  --volume $(pwd):/project \
  --workdir /project/code \
  --entrypoint stata-${TYPE} \
  ${CONTAINER} -b main.do
```



## Fallback: Run on a different computer

If you do not have, or cannot, install Docker, and you cannot use SIVACOR, use this alternative approach to test your code:

- Download your entire replication package from the draft openICPSR deposit, onto a **different computer** where you have not previously run the code.
- Run the code from that new location.
  - For Stata, close all Stata windows, and then double-click on the `main.do` file. This should generate a `main.log` file in the same directory as `main.do`.
    - For R, from a terminal or the RStudio **Terminal** tab, type `R CMD BATCH main.R`, or if using `renv`, `R --no-save --no-restore -f main.R > main.Rout`.[^noteshell]  This should generate a `main.Rout` file in the same directory as `main.R`.

[^noteshell]: In PowerShell, you can use `R --no-save --no-restore -f main.R | Out-File -Encoding UTF8 main.Rout`.

We note that in our experience, this approach is much less reliable.

## Success

If your code does run into problems, the generated `main.log` or `main.Rout` should have clues as to what went wrong. You should be able to fix these issues, and re-run the code in the container, until it runs without error.


If your code runs without error, and produces all expected output files, you are done!


## Problems?

If you run into problems in Step 5, no worries, simply submit all the files as modified in Steps 1-4, along with the completed checklist, and we will handle the remaining issues.


---



### Finalize README

> Reference: <https://social-science-data-editors.github.io/template_README/template-README.html>

This step is usually not necessary, but you want to just make sure that your README has the necessary information that help set expectations about computational feasibility, based on the steps above. 

- [**Software**:](https://social-science-data-editors.github.io/template_README/template-README.html#software-requirements) If you used a container, specify which container you used (name and tag, e.g., `dataeditors/stata18_5-mp:2025-02-26`). Be precise when describing the StataMP version - the number of cores matters! (`StataMP-4` may not behave the same way as `StataMP-8`.)
- [**Hardware**:](https://social-science-data-editors.github.io/template_README/template-README.html#memory-runtime-storage-requirements) Verify that the description of your computer (CPU, number of cores, RAM, disk space) is accurate.
- [**Run time**:](https://social-science-data-editors.github.io/template_README/template-README.html#memory-runtime-storage-requirements) Provide an estimate of the expected run time, however trivial it might be. It matters to the replicator!

**Examples:**

```
- OS: "openSUSE Leap 15.6"
- Processor:  13th Gen Intel(R) Core(TM) i7-1365U, 12 cores
- Memory available: 30GB memory
- Docker version 28.4.0-ce, build 249d679a6 
- stata version 18-mp-i (Docker image dataeditors/stata18-mp-i:2024-12-18) (born date: "18 Dec 2024") with 32 core license

Code ran for about 35 hours.
```

```
- OS: Windows Server AMD EPYC 7763 64-Core Processor 2.44 GHz, 128GB
- Stata/MP4 19.5 ("21 May 2025")
- MATLAB R2025a

Code runs about 10 minutes for Stata portion, and about 5 days for MATLAB portion.
```


### Submitting

You can now submit your replication package to the Data Editor, along with the completed checklist from above, and the generated `main.log`/`main.Rout` as evidence.
---



This document describes how to prepare your code for verification in detail, taking into account some of the most frequent issues that the Data Editor and his team have encountered in submitted replication packages.


> ⚠️❗ **IMPORTANT:** At this point, you should only be seeing this page if you were asked by the Data Editor team to do so, and if your replication package relies on a single software. Admissible containers are listed in the [Step 5 section: authorized containers](#authorized-containers). We are not currently attempting to generalize this to multi-software replication packages, though [it](https://github.com/AEADataEditor/docker-r-gurobi) [is](https://github.com/AEADataEditor/docker-aer-2022-0276) [possible](https://github.com/AEADataEditor/docker-aer-2023-0505) [to do so](https://github.com/AEADataEditor/docker-aer-2023-0700).



## Detailed instructions

### Preliminary: Directory structure of a replication package

A generic replication package, housed at `/my/computer/users/me/project`, might have the following structure: 

```
README.pdf
data/
   raw/
      cps0001.dat
   analysis/
      combined_data.dta
      combined_data.csv
      combined_data_codebook.pdf
code/
  01_readcps.do
  02_readfred.do
  03_table1-5.do
  04_figures1-4.do
results/
  table1.tex
  table2.xlsx
  ...
  figure1.png
  figure2.pdf
```

where

- `data/raw` has the externally acquired raw data files (not modified by the authors)
- `data/analysis` has the processed data files, generated by the code in this repository. It starts out empty, and **may not exist.**
- `code` has the code files.
- `results` has all the results files.

For illustration purposes, we have used Stata `.do` files, and outputs in a variety of formats, but the same principles apply to other software, and to any output formats.

> Note that we did not specify where the `main.do` file will be! 

### Short-cut

> If you want to include the key code pieces for Stata that are needed to comply with Steps 1-3, you can use  [this code fragment](https://gist.github.com/larsvilhuber/d8b643a408d425ef2a80385b6377870d). Note that you do not HAVE to use this specific code, if your code already has equivalent features!

