DCAS v1.0 Compliance Self-Assessment

11 minute read

Published:

DCAS Logo On 2022-12-15, various data editors, including myself, launched the Data and Code Availability Standard (DCAS). In this post, I will do a self-assessment of my claim that the AEA’s Data and Code Availability Policy (DCAP) complies with DCAS. The assessment is based on DCAS V1.0 rules, compared to the DCAP as of January 2023.


Section: Data

Data Availability Statement

What the DCAS saysWhat the AEA DCAP says
A Data Availability Statement is provided with detailed enough information such that an independent researcher can replicate the steps needed to access the original data, including any limitations and the expected monetary and time cost of data access. § 2: "Authors ... must provide ... information about the data, ... sufficient to permit replication, as well as information about access to data .... § 6: "the replication materials shall include... (b) description sufficient to access all data at their original source location," § 9: "...The data availability statement shall provide detailed information on how, where, and under what conditions an independent researcher can access the original source data, as well as author-generated derivative data, and must be explicit and accurate about any restrictions, requirements, payments, and processing delays. The data availability statement shall provide information to assure the reader that the data are available for a sufficiently long period of time."

Comment: The AEA policy is expressed more generally, but de facto, we interpret the policy to require information about limitations - time, location, person type - and cost - time, money - to be crucial, and generally request that information from authors if not provided.

Raw data

What the DCAS saysWhat the AEA DCAP says
Raw data used in the research (primary data collected by the author and secondary data not otherwise available) is made publicly accessible. Exceptions are explained under Rule 1. § 6: "the replication materials shall include (a) the data set(s), "

Comment: The AEA policy is not explicit about providing both raw and derivative data.

Analysis data

What the DCAS saysWhat the AEA DCAP says
Analysis data is provided as part of the replication package unless they can be fully reproduced from accessible data within a reasonable time frame. Exceptions are explained under Rule 1. § 6: "the replication materials shall include (a) the data set(s), "

Comment: In practice, raw data must always be provided. If the Data Editor observes that processing times are very long to generate derivative data, or that derivative data can be provided when raw data is confidential, then the Data Editor will request that analysis data be provided. It is NOT sufficient to ONLY provide the analysis data, when raw data are available and can be provided.

Format

What the DCAS saysWhat the AEA DCAP says
The data files are provided in any format compatible with commonly used statistical package or software. Some journals require data files in open, non-proprietary formats. § 11: Formats/Data: The data files may be provided in any format compatible with any commonly used statistical package or software. Authors are encouraged to provide data files in open, non-proprietary formats.

Comment: We do not (yet) require that data (also) be provided in non-proprietary formats. In general, source data are rarely in proprietary formats, but analysis data may be. We interpret “non-proprietary format” to mean that open source (free) software can read the provided data. This is not the same as requiring explicitly open, archive-friendly formats (Library of Congress). For instance, Stata data can be reliably read by Python and R code, and we will accept that. We will generally push back on MATLAB-formatted data, since they are hard to read with such software.

Metadata

What the DCAS saysWhat the AEA DCAP says
Description of variables and their allowed values are publicly accessible. § 11: Formats/Data: Authors should ensure that a meaningful name or description (label) is available for every variable in the provided datasets. Codebooks or similar metadata should describe the allowed values and their meaning for each variable. It is acceptable to reference publicly available documentation for these items.

Comment: This requirement can intersect with the data format. It is acceptable to have data formats (compliant with the previous DCAS rule) that include meaningful value and variable labels. When the data format itself does not support such self-documentation, separate codebooks, or references (citations!) to codebooks, are requested. This is somewhat difficult to enforce, and compliance may not be fully in line with the policy in all cases.

Citation

What the DCAS saysWhat the AEA DCAP says
All data used in the paper are cited. § 8: All source data used in the paper shall be cited, following the AEA Sample References.


Section: Code

Data transformation

What the DCAS saysWhat the AEA DCAP says
Programs used to create any final and analysis data sets from raw data are included. § 2: "Authors ... must provide ... information about the ... programs, and other details of the computations sufficient to permit replication .... § 6: "the replication materials shall include... (c) the programs used to create any final and analysis data sets from raw data,"

Analysis

What the DCAS saysWhat the AEA DCAP says
Programs producing the computational results (estimation, simulation, model solution, visualization) are included. § 2: (as before) § 6: "the replication materials shall include... (d) programs used to run the final models,"

Format

What the DCAS saysWhat the AEA DCAP says
Code is provided in source format that can be directly interpreted or compiled by appropriate software. § 12: Formats/Code: The programs may be provided in any format compatible with commonly used statistical package or software.

Comment: While the AEA policy does not explicitly specify “directly interpreted… by software”, it is understood that “compatible” means it must both work, and be readable. A PDF of a Python program cannot be (easily) interpreted by a Python interpreter. A Word document of Stata code cannot be directly read by Stata.


Section: Supporting materials

Instruments

What the DCAS saysWhat the AEA DCAP says
If collecting original data through surveys or experiments, survey instruments or experiment instructions as well as details on subject selection are included. § 7: For papers collecting original data through surveys or experiments, the replication materials shall also include (f) survey instruments or experiment instructions, (g) computer code for experiment or survey collection mechanisms, and (h) original instructions and details on subject selection. See the supplementary Policy on Experimental and Survey Papers.

Comment: Some of this information may also be provided in Online appendices. The AEA policy is a bit more stringent in that we explicitly treat “experiment instructions” as code when the experiment was conducted using software, though that is, in fact, in line with the overall requirement to have code when code was used.

Ethics

What the DCAS saysWhat the AEA DCAP says
If applicable, details are shared about ethics approval. § 16: If applicable, approval by ethics boards—the Institutional Review Board (IRB) in the United States and equivalent institutions elsewhere—should be demonstrated by including the name of the ethics board and any approval or exemption record number in the title footnote and the author disclosure statement(s). See the Disclosure Policy.

Comment: The AEA policy is a bit more stringent, in that it specifies exactly what to share about the ethics approval. It is both listed as part of the Disclosure Policy, and mentioned in the “title footnote.”

Pre-registration

What the DCAS saysWhat the AEA DCAP says
If applicable, pre-registration of the research is identified and cited. § 15: It is the policy of the AEA that randomized control trials must be registered on the RCT Registry. All such registrations shall be cited in the title footnote and elsewhere in the paper as appropriate. Please see the RCT Registry policy.

Documentation

What the DCAS saysWhat the AEA DCAP says
A README document is included, containing a Data Availability Statement, listing all software and hardware dependencies and requirements (including the expected run time), and explaining how to reproduce the research results. The README follows the schema provided by the [Social Science Data Editors’ template README](https://social-science-data-editors.github.io/template_README/). § 13: As part of the archive, authors must provide a README file listing all included files and documenting the purpose, format, and provenance of each file provided, as well as instructing a user on how replication can be conducted. The README shall contain the data availability statement and proper citations for all data used. The README shall follow the schema provided by the Social Science Data Editors' template README.

Comment: While the policy does not explicitly mention “software and hardware dependency and requirements (… run time…)”, it directly references the template README, which does specify those elements.


Section: Sharing

Location

What the DCAS saysWhat the AEA DCAP says
Data and programs are archived by the authors in the repositories deemed acceptable by the journal. § 3: Data and programs should be archived in the AEA Data and Code Repository. Footnote: Other repositories and archives may be acceptable, as long as these are considered to be "trusted" archives or repositories, see guidance. The AEA Data Editor will assess suitability of any such repositories and archives.

License

What the DCAS saysWhat the AEA DCAP says
A license specifies the terms of use of code and data in the replication package. The license allows for replication by researchers unconnected to the original parties. § 13: ... The README shall follow the schema provided by the Social Science Data Editors' template README.

Comment: This rule is included by reference - the License is an element for both data and code in the template README - as well as implicitly. The AEA Data and Code Repository defaults to a CC-BY license, and any deviation is scrutinized by the Data Editor to allow researchers unconnected to the original parties to access maximum information, while remaining compliant with restrictions and terms of use.

Omissions

What the DCAS saysWhat the AEA DCAP says
The README clearly indicates any omission of the required parts of the package due to legal requirements or limitations or other approved agreements. § 3: The Editor should be notified at the time of submission if access to the data used in a paper is restricted or limited, or if, for some other reason, the requirements above cannot be met. §4: If data or programs cannot be published in an openly accessible trusted data repository, authors must commit to preserving data and code for a period of no less than five years following publication of the manuscript, and to providing reasonable assistance to requests for clarification and replication. § 6: "the replication materials shall include... (b) description sufficient to access all data at their original source location,"

Comment: This rule is implicit in several different paragraphs of the AEA policy. The policy explicitly calls out situations where data may not be accessible or publishable. Authors must explicitly (publicly) commit to preserving materials, and supporting replication attempts. Deviations from the minimum duration are documented in the README. Implicit in 6(b) is the fact that a full description of how to access the data will generally also state that the data cannot be re-published. Finally, the license should be clear about this as well.

A few last notes

The current AEA policy was created before the release of the Data and Code Availability Standard. As the AEA and other journals align on a common standard, we will likely more closely adjust how our policy is expressed with the standard, making compliance by authors less well versed in these things easier.

I welcome any feedback.