Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

World Bank Reproducible Research Repository Downloads

The download_worldbank.py script automatically downloads replication packages, verification reports, and documentation from the World Bank Reproducible Research Repository.

Overview

This script resolves World Bank DOIs to download three key components:

Download files from World Bank Reproducible Research Repository

positional arguments:
  doi_or_id        World Bank repository identifier (DOI suffix, DOI, or DOI URL)

options:
  -h, --help       show this help message and exit
  --output OUTPUT  Output directory (default: current directory)
  --dry-run        Show what would be downloaded without actually downloading
  --version        show program's version number and exit

Installation

The script requires Python 3.6+ with the requests library:

pip install requests

Usage

Basic Usage

(Assumes python3 is the command for Python 3 on your system; replace with python as needed)

# Download using DOI suffix
python3 tools/download_worldbank.py azav-8915

# Download using full DOI
python3 tools/download_worldbank.py https://doi.org/10.60572/azav-8915

# Download using DOI without URL
python3 tools/download_worldbank.py 10.60572/azav-8915

Options

# Preview what will be downloaded (dry run)
python3 tools/download_worldbank.py azav-8915 --dry-run

# Download to specific directory
python3 tools/download_worldbank.py azav-8915 --output /path/to/downloads

# Show help
python3 tools/download_worldbank.py --help

Output Structure

Downloads are organized in a directory named wb-[DOI_SUFFIX]/:

wb-azav-8915/
├── README.pdf                                    # Project documentation
├── reproducibility-wb-azav-8915.2025-08-28.pdf   # Verification report (dated)
├── wb-azav-8915.zip                              # Original replication package
└── PP_WLD_2020_PRWP-9501_prg_v01/                # Extracted replication package
    ├── code/
    ├── data/
    └── ...

Examples

Example 1: Basic Download

$ python3 tools/download_worldbank.py azav-8915
🏷️  DOI suffix: azav-8915
📂 Output directory: wb-azav-8915
🌐 Resolving DOI: https://doi.org/10.60572/azav-8915
🔄 Redirect 1: https://doi.org/10.60572/azav-8915 -> https://reproducibility.worldbank.org/index.php/catalog/study/PP_WLD_2020_PRWP-9501_v01
✅ Final URL: https://reproducibility.worldbank.org/index.php/catalog/study/PP_WLD_2020_PRWP-9501_v01
🔍 Detected study URL format, checking for HTTP refresh redirect...
✅ Found HTTP Refresh header: 0;url=https://reproducibility.worldbank.org/index.php/catalog/39
🔄 Following HTTP refresh to: https://reproducibility.worldbank.org/index.php/catalog/39
✅ Successfully redirected to catalog: https://reproducibility.worldbank.org/index.php/catalog/39
📦 Catalog ID: 39
🔍 Found download IDs: ['81', '85']

📦 Found 2 file(s) to download:
⬇️  Downloading: README.pdf
✅ Downloaded: 180.4 KB of 180.4 KB (100.0%)
⬇️  Downloading: wb-azav-8915.zip  
✅ Downloaded: 58.4 MB of 58.4 MB (100.0%)
📦 Unzipping: wb-azav-8915.zip
✅ Unzipped to: wb-azav-8915

🎉 Download process completed!
✅ Successfully downloaded: 2 files

Example 2: Dry Run Preview

$ python3 tools/download_worldbank.py 101y-vn15 --dry-run
📦 Found 2 file(s) to download:
  1. Download ID 892: zip (329428 bytes)
     Content-Disposition: attachment; filename="README.pdf"
  2. Download ID 895: zip (202086235 bytes)
     Content-Disposition: attachment; filename="PP_CIV_2025_368.zip"

🔍 Dry run completed. No files were downloaded.

Technical Details

DOI Resolution Process

The script follows this resolution chain:

  1. DOI Redirect: https://doi.org/10.60572/azav-8915 → Study URL

  2. Study URL: /catalog/study/PP_WLD_2020_PRWP-9501_v01 (returns HTTP Refresh header)

  3. HTTP Refresh: 0;url=https://reproducibility.worldbank.org/index.php/catalog/39

  4. Catalog Page: /catalog/39 (contains download links)

File Type Detection

The script identifies files using HTTP Content-Disposition headers and content types:

Progress Indicators

Supported Input Formats

FormatExampleDescription
DOI Suffixazav-8915Just the unique identifier
DOI10.60572/azav-8915Standard DOI format
DOI URLhttps://doi.org/10.60572/azav-8915Full URL

Error Handling

The script handles common issues gracefully:

Integration

Continuous Integration

The script detects CI environments via the CI environment variable and:

Git Integration

For automated workflows, the script can be integrated with git operations:

# Download and add to git
python3 tools/download_worldbank.py azav-8915
git add wb-azav-8915/
git commit -m "Add World Bank replication package azav-8915"

Troubleshooting

Common Issues

Directory already exists

# Error: wb-azav-8915 already exists - please remove prior to downloading
rm -rf wb-azav-8915
python3 tools/download_worldbank.py azav-8915

Network connectivity issues

Invalid DOI format

# Error: Could not extract DOI suffix from: invalid-doi
# Use one of these formats:
python3 tools/download_worldbank.py azav-8915                    # DOI suffix
python3 tools/download_worldbank.py 10.60572/azav-8915          # DOI
python3 tools/download_worldbank.py https://doi.org/10.60572/azav-8915  # DOI URL

Limitations

Version Information

This script is part of the replication template toolkit and follows semantic versioning. Check the repository for the latest version and updates.