Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

download_dv.py - Download datasets from Dataverse repositories

Description

This script downloads complete datasets from Dataverse instances (like Harvard Dataverse) using their DOI (Digital Object Identifier) and the Dataverse API. It creates organized directory structures and automatically extracts downloaded ZIP files.

Usage

python3 tools/download_dv.py --doi DOI [--server_url URL] [--output PATH]

Arguments

Examples

# Download from Harvard Dataverse (default)
python3 tools/download_dv.py --doi "doi:10.7910/DVN/ABC123"

# Download from custom Dataverse instance
python3 tools/download_dv.py --doi "doi:10.7910/DVN/ABC123" --server_url "https://dataverse.example.edu"

# Specify custom output directory
python3 tools/download_dv.py --doi "doi:10.7910/DVN/ABC123" --output "./downloads"

Features

API Usage

Output Structure

Input DOI: doi:10.7910/DVN/ABC123
Output directory: ./dv-DVN-ABC123/
Downloaded file: ./dv-DVN-ABC123/ABC123.zip (extracted automatically)

Error Handling

Requirements

How It Works

  1. DOI Parsing: Extracts publisher and dataset ID from DOI

  2. API Query: Connects to Dataverse API to get dataset metadata

  3. Directory Creation: Creates organized output directory structure

  4. Download: Downloads complete dataset as ZIP file

  5. Extraction: Automatically extracts ZIP contents

  6. Cleanup: Maintains organized file structure for further processing

This tool is essential for reproducible research workflows that rely on datasets hosted in Dataverse repositories.