Various helpful scripts - AEA Replication Template

Last updated: April 2, 2026

All of the following scripts are either made available in bash when you run the bash setup in the $HOME/bin directory, or are available in the tools/ folder in each repository.

Data Download and Synchronization Tools¶

download_box_private.py¶

Python script for downloading files from private Box folders using JWT authentication.

Links: Source | Help

download_dv.py¶

Python script for downloading complete datasets from Dataverse repositories as ZIP archives using DOI.

Links: Source | Help

download_openicpsr-private.py¶

Python script for downloading files from private (unpublished) openICPSR deposits with authentication.

Links: Source | Help

download_openicpsr-public.py¶

Python script for downloading files from public (published) openICPSR deposits.

Links: Source | Help

download_osf.sh¶

Bash script for downloading all files and directories from Open Science Framework (OSF) projects.

Links: Source | Help

download_sivacor.py¶

Python script for downloading SIVACOR submission artifacts, handles ZIP extraction, and commits results to git branch.

Links: Source | Help

get_sivacor_info.py¶

Python script for extracting computing environment and timing information from SIVACOR JSONLD files. Can output to stdout or automatically update replication reports.

Links: Source | Help

download_zenodo_draft.py¶

Python script for downloading files from Zenodo draft deposits that require authentication.

Links: Source | Help

download_zenodo_public.sh¶

Bash script for downloading files from public Zenodo repositories using zenodo_get tool.

Links: Source | Help

list_box_files.py¶

Lists files from a private Box folder using JWT authentication and outputs results to a text file.

Links: Source | Help

sync-codeocean.sh¶

Synchronizes CodeOcean capsules with local repositories, maintaining both live Git clones and static copies.

Links: Source | Help

zenodo_get_ci.py¶

CI-friendly wrapper for zenodo_get that suppresses animated progress bar in automated pipelines.

Links: Source | Help

File Format Conversion Tools¶

convert_eps.sh¶

Bash script that recursively converts EPS (Encapsulated PostScript) files to PNG format using ImageMagick.

Links: Source | Help

convert_graphs.do¶

Stata script that converts GPH graph files to PDF and PNG formats.

Links: Source | Help

csv2md.py¶

Python tool for converting arbitrary CSV files to Markdown format.

Links: Source | Help

matlab_convert_fig.m¶

MATLAB script that converts .fig files to PNG format, processing all figure files in the current directory.

Links: Source | Help

matlab_convert_mat2csv.m¶

MATLAB script that converts .mat files to CSV format, extracting all variables as separate CSV files.

Links: Source | Help

mk_tex_table.sh¶

Converts standalone LaTeX table files to complete PDF documents with comprehensive formatting packages.

Links: Source | Help

Tools to check for various things¶

These are usually not used directly, but run by the Pipelines.

Stata_scan_code/¶

Directory containing Stata code scanning tools and packages for analyzing Stata scripts and dependencies.

Links: Source | Help

check_ipynb_order.py¶

Python script that verifies Jupyter notebook code cells were executed in sequential order for reproducibility.

Links: Source | Help

check_r_deps.R¶

R script that finds and outputs all R package dependencies as CSV from a project directory.

Links: Source | Help

check_rds_files.R¶

R script for checking RDS (R data files), designed to run automatically without manual changes.

Links: Source | Help

doi_validator.py¶

Python module to validate DOI links and convert between formats for Harvard Dataverse DOIs.

Links: Source | Help

find_cran_date.py¶

Python tool that determines minimum CRAN snapshot date for pinned R packages and reports matching Docker images.

Links: Source | Help

install.R¶

R package installation utility with version control; provides pkgTest() function to install and require packages.

Links: Source | Help

scan_pkg.jl¶

Julia package scanner that identifies and lists packages used in Julia files via using and import statements.

Links: Source | Help

summarize_data.py¶

Python script that summarizes data metadata by directory levels, aggregating file sizes from CSV.

Links: Source | Help

Ad-hoc Data Analysis and Comparison Tools¶

compare_manifests.py¶

Python script that compares two SHA256 manifest files to identify overlaps in filenames, checksums, and complete records.

Links: Source | Help

generate_png_diff.sh¶

Generates visual diffs for modified PNG images by comparing them against their git repository versions.

Links: Source | Help

summarize_diff_stats.py¶

Parses and summarizes statistical differences from files, extracting numerical values and filenames.

Links: Source | Help

Pipeline and Workflow Tools¶

pipeline-steps1-4.sh¶

Combined pipeline script that handles multiple steps of the openICPSR download process.

Links: Source | Help

run_scanner.sh¶

Runs Stata code scanner on ICPSR directory, reads configuration and executes scanning operations.

Links: Source | Help

sbatch-shell.sh¶

SLURM batch job script template for running Stata jobs on HPC clusters with resource specifications.

Links: Source | Help

JIRA Integration Tools¶

These tools integrate with the AEA Data Editor Jira system for task tracking and metadata extraction.

jira_add_comment.py¶

Posts comments to Jira issues using the Jira API with support for wiki markup formatting.

Links: Source | Help

jira_find_task_by_icpsr.py¶

Finds the highest-numbered Jira Task issue for a given openICPSR project ID.

Links: Source | Help

jira_get_info.py¶

Retrieves various information fields from Jira issues including DOIs, openICPSR URLs, and SIVACOR IDs.

Links: Source | Help

jira_download_attachments.py¶

Downloads all attachments from Jira issues with their original filenames, with support for filtering and list-only mode.

Links: Source | Help

Configuration and Setup Tools¶

linux-system-info.sh¶

System information collector that displays OS details, processor info, and memory availability.

Links: Source | Help

update_tools.sh¶

Tool updater that downloads latest replication template files from GitHub and copies them to template directory.

Links: Source | Help

Document Processing Tools¶

prepare-revision.py (inactive)¶

Processes Markdown files by replacing code block content in Appendix sections while maintaining headers.

Links: Source | Help

Configuration Files¶

requirements-scanner.txt¶

Python requirements file for scanner tools.

Links: Source

requirements.txt¶

Python requirements file for general tools.

Links: Source

template.tex¶

LaTeX template file for document generation.

Links: Source