Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Introduction

This guide helps you understand and use the automated replication verification pipeline. The pipeline downloads replication packages from data repositories, analyzes code and data files, scans for dependencies and issues, and generates comprehensive reports.

Supported Repositories

The pipeline can automatically download from:

Available Pipelines

Standard Pipelines

1-populate-from-icpsr

What it does:

Parameters:

When it’s done:


w-big-populate-from-icpsr (Large Deposit Pipeline)

When to use:

What it does: Same as standard pipeline, but optimized for large deposits

How it’s different:


Utility Pipelines

2-merge-report (Merge Report Sections)

When to use: After completing split report sections separately

What it does:


3-split-report (Split Report)

When to use: Need to work on report in separate sections

What it does:


4-refresh-tools (Update Tools)

When to use:

What it does:


5-rename-directory (Rename Deposit)

When to use: Need to rename a deposit directory

What it does:

Parameters:


6-convert-eps-pdf (Convert Graphics)

When to use: Convert graphics for better viewing/comparison

What it does:

Parameters:


7-download-box-manifest (Download Box Data)

When to use: Download restricted data from Box and create manifests

What it does:

Parameters:

Requirements:

Note: This pipeline is specifically for handling restricted/confidential data that cannot be stored in public repositories.


Execution Pipelines

z-run-stata (Run Stata Code) (BETA)

When to use: Execute Stata replication code

What it does:

Requirements:

Resources: 2x (8GB RAM)

Typical duration: Varies based on code (30 minutes to several hours)


z-run-any-big (BETA)

When to use:

What it does: Same as z-run-stata, also allows for R jobs, with maximum resources

Resources: 8x (32GB RAM, 16 vCPU)

Cost: Uses significantly more build minutes


Understanding Pipeline Output

After a successful pipeline run, you’ll find:

Main Report

Generated Directory

All analysis outputs in generated/:

Deposit Directory