Data publications & reproducibility

Publishing data and software

Papers published in data journals describe in detail the collection, curation, and analysis of datasets. The papers may be peer-reviewed depending on the journal. Similar journals exist for describing the development and functionality of research software. You may consider publishing a data or software paper as a companion to a traditional journal article or as a standalone publication.

If you're simply looking to share your data/code in a data repository in order to get a DOI, you may consider ReDATA, or other data repositories.

Examples of data journals

Scientific Data (Nature) (opens in a new tab)
Methods (Nature) (opens in a new tab) lab methods
Elsevier Data in Brief (opens in a new tab)
Geoscience Data Journal (opens in a new tab)
Earth System Science Data (opens in a new tab)
GigaScience (opens in a new tab) big data from life & biomedical sciences; open-access, open-data, open peer-review
CODATA Data Science Journal (opens in a new tab) Journal on the management, dissemination, use and reuse of research data and databases across all research domains
Data Science (opens in a new tab) focuses on data mining, machine learning, scientific computing, reproducibility
Also see "Data Journals: A survey" (opens in a new tab) provides list of 116 data journals published by 15 publishers by subject (Candela, L., Castelli, D., Manghi, P. and Tani, A. (2015), Data journals: A survey. J Assn Inf Sci Tec, 66: 1747–1762. doi:10.1002/asi.23358)

Examples of software journals

Data and software journals

F1000Research (opens in a new tab) publish all your findings including null results, data notes and more.
Rescience C (opens in a new tab) encourages explicit replication of already published research

Methodology and protocols

Protocols.io (opens in a new tab) share science methods, computational workflows, procedures, etc.
Elsevier MethodsX (opens in a new tab)

Example policies

This is a selection of policies to illustrate what top journals are requiring.

Reproducible research

There are several facets to move research from being hidden, unusable, and irreproducible to fully Findable, Accessible, Interoperable, and Reusable (FAIR (opens in a new tab)). These include, but are not limited to, following data management and sharing best practices.

One facet that can greatly aid in making research FAIR is by adopting tools and practices to make it easier for other to re-execute computational analyses. Re-executing an analysis helps to

Increase the defensibility of conclusions through transparent and open research
Increase the ability to verify results
Enable re-use of all or parts of the software and data in new research

Better software practices for reproducible research

Writing better software improves research reproducibility. Practices include

Organizing data and code appropriately. See the suggested folder structures in Data organization.
Using scripting or other automation instead of manual processing to ensure tasks can be repeated.
Avoiding hard-coding configurations (file paths, parameter values, etc).
Avoiding absolute file paths. Use relative paths instead for portability.
Using version control systems (git, SVN, etc)
Attaching an appropriate license to software. The MIT license is recommended for simplicity while maximizing reusability. Choosalicense.org (opens in a new tab) can help you select an appropriate license. For software with commercial potential, consult TechLaunch Arizona before releasing software as open source.

Johns Hopkins University has prepared a 22 minute online tutorial (opens in a new tab) consisting of 6 modules, explaining the above practices in more detail.

Tools for enabling reproducible analyses

These are listed for informational purposes and do not imply endorsement.

CATEGORY	EXAMPLES	NOTES
Reproducibility Platforms	CyVerse (opens in a new tab) (Discovery Environment) Code Ocean (opens in a new tab) WholeTale (opens in a new tab) MyBinder (opens in a new tab) (for Jupyter Notebooks)	These are platforms that aim to make computational reproducibility easier by helping you package together all data, software, and dependencies into a portable package that can be executed by others in one click, using an easy-to-use graphical interface. Most of these tools have the sharing of analyses as a central focus. Some, like Code Ocean, have integrations with certain journals to allow including executable analyses within papers.
Containers & Virtual Machines	Docker, Apptainer/Singularity (opens in a new tab) VirtualBox, VMWare, Hyper-V	Containers can package together all software dependencies for better portability. Many of the products in the Reproducibility Platforms category above are built on container and virtualization technologies.
Packaging and dependency capture	Reprozip (opens in a new tab) Packrat (opens in a new tab) (for R only)	Unlike the categories above which require manually bundling dependencies, this category of tools aim to make it easier to collect and package software dependencies automatically.
Workflow automation	Snakemake Pegasus (opens in a new tab) Kepler (opens in a new tab) VisTrails (opens in a new tab) Clowdr (opens in a new tab)	These systems aim to make it easier to link together data workflows in a reproducible fashion. While workflow tools have general characteristics, many tools are specific to certain domains. The ones listed here are general purpose with varying degrees of maturity. The San Diego Supercomputing Center has developed a comparison tool.

Best practices & resources