Last updated: April 2, 2026
All of the following scripts are either made available in bash when you run the bash setup in the $HOME/bin directory, or are available in the tools/ folder in each repository.
Data Download and Synchronization Tools¶
download_box_private.py¶
Python script for downloading files from private Box folders using JWT authentication.
download_dv.py¶
Python script for downloading complete datasets from Dataverse repositories as ZIP archives using DOI.
download_openicpsr-private.py¶
Python script for downloading files from private (unpublished) openICPSR deposits with authentication.
download_openicpsr-public.py¶
Python script for downloading files from public (published) openICPSR deposits.
download_osf.sh¶
Bash script for downloading all files and directories from Open Science Framework (OSF) projects.
download_sivacor.py¶
Python script for downloading SIVACOR submission artifacts, handles ZIP extraction, and commits results to git branch.
get_sivacor_info.py¶
Python script for extracting computing environment and timing information from SIVACOR JSONLD files. Can output to stdout or automatically update replication reports.
download_zenodo_draft.py¶
Python script for downloading files from Zenodo draft deposits that require authentication.
download_zenodo_public.sh¶
Bash script for downloading files from public Zenodo repositories using zenodo_get tool.
list_box_files.py¶
Lists files from a private Box folder using JWT authentication and outputs results to a text file.
sync-codeocean.sh¶
Synchronizes CodeOcean capsules with local repositories, maintaining both live Git clones and static copies.
zenodo_get_ci.py¶
CI-friendly wrapper for zenodo_get that suppresses animated progress bar in automated pipelines.
File Format Conversion Tools¶
convert_eps.sh¶
Bash script that recursively converts EPS (Encapsulated PostScript) files to PNG format using ImageMagick.
convert_graphs.do¶
Stata script that converts GPH graph files to PDF and PNG formats.
csv2md.py¶
Python tool for converting arbitrary CSV files to Markdown format.
matlab_convert_fig.m¶
MATLAB script that converts .fig files to PNG format, processing all figure files in the current directory.
matlab_convert_mat2csv.m¶
MATLAB script that converts .mat files to CSV format, extracting all variables as separate CSV files.
mk_tex_table.sh¶
Converts standalone LaTeX table files to complete PDF documents with comprehensive formatting packages.
Tools to check for various things¶
These are usually not used directly, but run by the Pipelines.
Stata_scan_code/¶
Directory containing Stata code scanning tools and packages for analyzing Stata scripts and dependencies.
check_ipynb_order.py¶
Python script that verifies Jupyter notebook code cells were executed in sequential order for reproducibility.
check_r_deps.R¶
R script that finds and outputs all R package dependencies as CSV from a project directory.
check_rds_files.R¶
R script for checking RDS (R data files), designed to run automatically without manual changes.
doi_validator.py¶
Python module to validate DOI links and convert between formats for Harvard Dataverse DOIs.
find_cran_date.py¶
Python tool that determines minimum CRAN snapshot date for pinned R packages and reports matching Docker images.
install.R¶
R package installation utility with version control; provides pkgTest() function to install and require packages.
scan_pkg.jl¶
Julia package scanner that identifies and lists packages used in Julia files via using and import statements.
summarize_data.py¶
Python script that summarizes data metadata by directory levels, aggregating file sizes from CSV.
Ad-hoc Data Analysis and Comparison Tools¶
compare_manifests.py¶
Python script that compares two SHA256 manifest files to identify overlaps in filenames, checksums, and complete records.
generate_png_diff.sh¶
Generates visual diffs for modified PNG images by comparing them against their git repository versions.
summarize_diff_stats.py¶
Parses and summarizes statistical differences from files, extracting numerical values and filenames.
Pipeline and Workflow Tools¶
pipeline -steps1 -4 .sh¶
Combined pipeline script that handles multiple steps of the openICPSR download process.
run_scanner.sh¶
Runs Stata code scanner on ICPSR directory, reads configuration and executes scanning operations.
sbatch-shell.sh¶
SLURM batch job script template for running Stata jobs on HPC clusters with resource specifications.
JIRA Integration Tools¶
These tools integrate with the AEA Data Editor Jira system for task tracking and metadata extraction.
jira_add_comment.py¶
Posts comments to Jira issues using the Jira API with support for wiki markup formatting.
jira_find_task_by_icpsr.py¶
Finds the highest-numbered Jira Task issue for a given openICPSR project ID.
jira_get_info.py¶
Retrieves various information fields from Jira issues including DOIs, openICPSR URLs, and SIVACOR IDs.
Configuration and Setup Tools¶
linux -system -info .sh¶
System information collector that displays OS details, processor info, and memory availability.
update_tools.sh¶
Tool updater that downloads latest replication template files from GitHub and copies them to template directory.
Document Processing Tools¶
prepare-revision.py (inactive)¶
Processes Markdown files by replacing code block content in Appendix sections while maintaining headers.
Configuration Files¶
requirements-scanner.txt¶
Python requirements file for scanner tools.
Links: Source
requirements.txt¶
Python requirements file for general tools.
Links: Source
template.tex¶
LaTeX template file for document generation.
Links: Source