Description¶
This script downloads files from a private Box folder using JWT authentication. It’s designed for secure access to private research data stored in Box with proper authentication credentials.
Usage¶
python tools/download_box_private.py [SUBFOLDER] Arguments¶
SUBFOLDER (optional) - Subfolder identifier (downloads from ‘aearep-SUBFOLDER’). This will be the tag of the main Jira ticket, such as aearep-1234. If empty, will be deduced from the current directory name.
Example¶
python tools/download_box_private.py 1234 # Download from subfolder 'aearep-1234'or
cd /path/to/aearep-1234
python tools/download_box_private.py # Download from subfolder 'aearep-1234'Requirements¶
Python >= 3.9
boxsdk: Box Python SDKValid Box JWT application credentials
Installing Box SDK¶
This can be done through conda, or in pip, or pipx.
The above dependencies can be installed by executing
pip install -r requirements.txtin any recent Bitbucket repository updated with “tools” newer than June 7, 2025.
Ideally, these should be installed in your main Python environment, since you will be re-using this regularly. You can also install in a virtual environment.
If using conda, you can install boxsdk with:
conda install boxsdk
conda install boxsdk[jwt]Currently, Box is updating
boxsdktobox-sdk-gen. This script will work only withboxsdk, not the newerbox-sdk-gen.
Using Right Credentials¶
To permantently set the proper credentials on BioHPC, you can modify your ~/.bashrc profile, to include the box environmental variables. These values (or the (SECRETURL)) can be obtained from a supervisor.
Download Environment Variables File¶
Download the environment variable setup file onto BioHPC:
wget (SECRETURL) -O ~/envvars.txtThe first time, adjust your ~/.bashrc to read the envvars.txt:
echo "source ~/envvars.txt" >> ~/.bashrcThen, load the variables into your current session:
source ~/.bashrcThis ensures that all required Box environment variables are available in your session.
Python Environment Setup¶
To run the script, you need a newer version of Python (>= 3.10). There are two ways.
You can use standard Python on BioHPC compute nodes. Get a terminal through Slurm:
srun --pty bash -lLoad the necessary modules, and install the necessary packages:
module load python/3.12.7
cd /path/to/aearep-1234 # adjust to the project at hand
pip install -r requirements.txtYou will need to run this every time from a compute node, as the more recent Python versions are not available on the login nodes.
Create a Conda Environment (from BioHPC login node)
To avoid conflicts with existing Python installations, create a dedicated conda environment:
Load conda if needed
module load anaconda # or follow BioHPC-specific instructions to enable condaCreate environment with Python 3.11
conda create --name download python=3.11Activate the environment
You will need to do this every time after you log in, if you intend to use the script:
conda activate downloadInstall Box SDK
Follow the conda install instructions from above to install the necessary packages.
⚠️ Important: Some newer versions of boxsdk are incompatible. If you get import errors (No module named ‘boxsdk’), uninstall the current version and install a compatible version:
conda uninstall boxsdk
conda install -c conda-forge boxsdk=3.14.0Install Additional Dependencies
If you encounter authentication errors related to JWT, install pyjwt:
pip install pyjwtEnvironment Variables¶
Required for authentication:
BOX_FOLDER_PRIVATE - Box folder ID to download from
BOX_PRIVATE_KEY_ID - Box JWT public key ID
BOX_ENTERPRISE_ID - Box enterprise ID
BOX_CLIENT_ID - Box client ID (optional if using config file)
BOX_CLIENT_SECRET - Box client secret (optional if using config file)
Optional configuration:
BOX_CONFIG_PATH - Directory containing the Box config file
BOX_OUTPUT_DIR - Directory to download files to (default: ./restricted)
BOX_PRIVATE_JSON - Base64 encoded content of the Box config JSON file
Base64 Configuration¶
To convert the entire Box configuration as a base64-encoded string:
# Generate base64 config
cat config.json | base64
# Set environment variable
export BOX_PRIVATE_JSON="base64_encoded_string_here"Authentication Methods¶
1. Environment Variables¶
Set individual Box API credentials as environment variables.
2. Config File¶
Place Box JWT configuration file in the directory specified by BOX_CONFIG_PATH.
3. Base64 Encoded Config¶
Provide entire configuration as base64-encoded string in BOX_PRIVATE_JSON.
Features¶
JWT Authentication: Secure access using Box JWT authentication
Flexible Configuration: Multiple authentication methods for different environments
Organized Downloads: Downloads to organized folder structure
Error Handling: Comprehensive error handling for API failures
Logging: Configurable logging for debugging and monitoring
Output Structure¶
restricted/ # Default output directory (configurable)
└── # Downloaded files from specified subfolder
├── file1.txt
├── file2.pdf
└── ...Box Application Setup¶
To use this script, a technical person / supervisor needs to set up the following:
Box Developer Account: Create at https://
developer .box .com/ JWT Application: Create a new JWT application in Box Developer Console
Authentication Keys: Download the JWT configuration file
Folder Access: Ensure the application has access to the target folder
Security Considerations¶
Store JWT credentials securely
Use environment variables in production
Limit application permissions to necessary scopes
Regularly rotate authentication credentials