Description¶
Validates DOI links and converts them between different formats. Specifically designed for Harvard Dataverse DOIs (10.7910/DVN prefix), extracting the dataset tag and determining the server URL from various input formats.
Usage¶
from doi_validator import doi_validate
# Call the validation function
result = doi_validate(doi_string)This is a Python module designed to be imported and used in other scripts, not a standalone command-line tool.
Supported Input Formats¶
Full DOI URL:
https://doi.org/10.7910/DVN/JKFYMJDataverse URL:
https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/JKFYMJShort DOI:
10.7910/DVN/JKFYMJ
Return Values¶
Valid tag (string) - 6-character alphanumeric tag (e.g.,
JKFYMJ)“invalid format” (string) - If DOI format is invalid or tag is malformed
SERVER_URL - Set to the Dataverse server URL when applicable
Example¶
from doi_validator import doi_validate
# Validate various formats
tag1 = doi_validate("https://doi.org/10.7910/DVN/JKFYMJ")
# Returns: "JKFYMJ"
tag2 = doi_validate("10.7910/DVN/AB123C")
# Returns: "AB123C"
tag3 = doi_validate("https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/XY89ZW")
# Returns: "XY89ZW"
tag4 = doi_validate("invalid-doi")
# Returns: "invalid format"Requirements¶
Python >= 3.6
Standard library only (no external dependencies)
Validation Rules¶
Tag must be exactly 6 characters
Tag characters must be: digits (0-9) or uppercase letters (A-Z)
DOI must use the 10.7910/DVN prefix (Harvard Dataverse)
Use Cases¶
Extract Dataverse dataset tags from various URL formats
Validate DOI links before processing
Convert between DOI representation formats
Parse Dataverse URLs in download scripts
See Also¶
download_dv.py - Uses DOI validation to download from Dataverse