… although some are not frequently asked, but might nevertheless be useful. Below questions and answers in random order. Should you have other questions, please create a new issue on Github, ask the question on Twitter, or send an email to the AEA Data Editor.
Generically, each openICPSR project has a number (e.g., “109622”), that might show up on the right panel: Then
Give it a try:
If you created your own data (experiments, surveys, etc.), you should do one of two things:
… the directory structure has gotten a little clunky over the years working on this project…
The Data and Code Availability Policy says:
“Files uploaded to the AEA Data and Code Repository should retain the file names as originally executed or used, their original file format, and their original “grouping” in terms of directories.”
You should feel free to reorganize, but you should ensure when we run the reorganized files, they produce the same results that are reported in the paper. Or put differently, the numbers in the paper should be produced by the reorganized files. We are not trying to reproduce your historical path to the paper, only the current state of the paper.
Such restructuring may also be appropriate if you have a very sophisticated reproducible setup in your lab or group. A replicator does not need all sorts of fancy dynamic setup scripts that are very relevant in a lab, but unnecessarily complicate the process for a replicator. You should attempt to simplify the final setup to make it easy for anybody to run this particular project, once.
[MAC USERs ONLY] We are also not sure, but it is a standard feature of ZIP files created on Mac OSX systems using the graphical user interface. Here’s a quick fix that helps all parties involved (adapted from this source):
zip -d(note space)
The whole thing should look like this:
$ zip -d /Users/myname/Workspace/Folder/myzip.zip "__MACOSX*" deleting: __MACOSX/ deleting: __MACOSX/myzip/ deleting: __MACOSX/myzip/._Proof_hi.pdf deleting: __MACOSX/myzip/._README.pdf deleting: __MACOSX/._myzip
You can now upload the file to openICPSR using the “Import from ZIP” functionality.
We should note that these folders do not show up in the public view of the repository once it is published. So while it is probably OK to leave them, it is better to remove them.
[Answer from ICPSR] I think it still makes sense to complete as much metadata as possible. There are syntax files specific to the data available through a restricted-use agreement. The metadata are for increasing findability of the data collection – even if only the syntax are in the repository. It’s useful to know the data analyzed with the syntax are about a specific geographic coverage for a specific time period.
First, all sharing - whether privately with us, or publicly through the data publication process - should be in compliance with all IRB rules, data use agreements, etc. We will never ask you to share data that you do not have the right to share with us or anybody else.
Second, there is a difference between sharing with us, and publishing the data. We can accept private data sharing for the purpose of replication, conduct our reproducibility checks, and delete the data provided. You are in control of the publication of any data (though it has happened that we have had to point out to authors that they do not, in fact, have the rights to publish data that they were going to publish).
Third, the inability to publish the data does not absolve you from creating an archive of the data as it was used for the article. This archive, for private/confidential/proprietary data, should remain private - on your own systems, or appropriate university archives. But it must exist, so that you can reliably answer queries from authors in future years.
How should you proceed?
The best way to think of this is as a set of layers. Your working directory WD, from which you derived the tables and figures in the paper, is composed of confidential data CD, non-confidential data NCD, and programs/code P (and possibly temporary files TF). So WD = CD + NCD + P + TF. For the purpose of replication archives, you should create two archives:
You should then test: create an empty directory, unpack the two archives, and verify that they are sufficient:
(unzip A.zip) + (unzip B.zip) == NCD + P + CD == WD - TF
You should then import A.zip into the openICPSR archive, and ensure that B.zip is properly and securely archived, in compliance with all rules that you are subject to.
You can provide B.zip to us for the purpose of replication, but B.zip would not be published.
We are open to linking out to existing archives of code and data. However, GitLab & Co. are not archives! See the relevant section on Social Science Data Editors pages.
Thus, in all cases, a proper archive needs to be created from the git repository. There are various ways:
unzip aea-de-guidance-20200129.zip && pushd aea-de-guidance-20200129; zip -rp ../upload.zip; popd)
upload.zip) into the AEA Data and Code Repository project
The above guidance does not preclude linking out to live code on such platforms. At present, we have no concrete plans, but we are considering various ways to make articles and their landing pages more informative. In the short term, treat a Github repo as any other URL, and cite it:
Lars Vilhuber. 2020. “AEADataEditor/aea-de-guidance: Unofficial guidance for authors [Github]”. https://github.com/AEADataEditor/aea-de-guidance. Accessed on March 11, 2020.
You can also elaborate more freely in the README.
First, packages on CRAN and the Statistical Software Components can be cited. AEA citation guidance is currently silent on software components, but it is not wrong to cite them, and other disciplines do it regularly. CRAN in fact has elements of a “proper archive” (SSC does NOT). All R packages can generate a (Bibtex) citation.
Second, it is possible to submit such packages to various journals, where they are reviewed and published with DOI:
The generic answer is yes. The key is to make it clear in the README how to run the software. Most economists know how to run Stata, Matlab, and probably figure out how to run R or Julia even if it is not their native programming environment. For software that is less standard (GIS, SQL databases, Docker, Jupyter notebooks, cloud-based compute clusters), we suggest pointing to (citing) an introductory tutorial on the web in the README, and providing a barebones set of instructions on how to get started. The picture below illustrates what software can be considered to be “common” amongst economists:
https://doi.org/10.3886/E123456V1. Copy that down.
Per the PSID website, you should include the following acknowledgement:
The collection of data used in this study was partly supported by the National Institutes of Health under grant number R01 HD069609 and R01 AG040213, and the National Science Foundation under award numbers SES 1157698 and 1623684.