A data repository is a place to archive and make publicly available research datasets. To select an appropriate repository take the following steps
| Step | Note |
|---|---|
| 1. Are you required to deposit in a certain repository? | Some funders and journals require or recommend datasets be deposited in their repositories. Check the specific requirements or contact us for assistance in making this determination |
| 2. Is there a discipline-specific repository? | If you have a choice of where to deposit, look for commonly used repositories in your discipline. Some repositories are geared towards groups of disciplines while others are specific to a specific kind of research. |
| 3. If there is no discipline-specific repository, select a general repository | There are several general-purpose repositories that can fulfill funder and journal sharing requirements. The choice often comes down to personal preferences. |
For a one-stop shop that addresses all funder, journal, and University data archiving and sharing requirements, consider ReDATA, the University of Arizona's Research Data Repository.
For archiving open access manuscripts, theses/dissertations, monographs, etc., please visit the Campus Repository. Contact Kimberly Chapman, Director.
Desirable characteristics of data repositories
The Desirable Characteristics of Data Repositories for Federally Funded Research, developed by the National Science and Technology Council (NSTC) are used by many federal funders in their guidance to researchers for selecting a data repository. The NSTC is part of the Office of Science and Technology Policy (OSTP). Data repositories used to comply with federal data sharing requirements should demonstrate these characteristics whenever possible.
| Characteristic | Description | ReDATA |
|---|---|---|
| Unique Persistent Identifiers | Assigns datasets a citable, unique persistent identifier, such as a digital object identifier (DOI) or accession number, to support data discovery, reporting, and research assessment. The identifier points to a persistent landing page that remains accessible even if the dataset is de-accessioned or no longer available | Yes |
| Long-Term Sustainability | Has a plan for long-term management of data, including maintaining integrity, authenticity, and availability of datasets; building on a stable technical infrastructure and funding plans; and having contingency plans to ensure data are available and maintained during and after unforeseen events. | Yes |
| Metadata | Ensures datasets are accompanied by metadata to enable discovery, reuse, and citation of datasets, using schema that are appropriate to, and ideally widely used across, the community(ies) the repository serves. | Yes |
| Curation and Quality Assurance | Provides, or has a mechanism for others to provide, expert curation and quality assurance to improve the accuracy and integrity of datasets and metadata. | Yes |
| Free and Easy Access | Provides broad, equitable, and maximally open access to datasets and their metadata free of charge in a timely manner after submission, consistent with legal and ethical limits required to maintain privacy and confidentiality, Tribal sovereignty, and protection of other sensitive data. | Yes |
| Broad and Measured Reuse | Makes datasets and their metadata available with broadest possible terms of reuse; and provides the ability to measure attribution, citation, and reuse of data | Yes |
| Clear Use Guidance | Provides accompanying documentation describing terms of dataset access and use. | Yes |
| Security and Integrity | Has documented measures in place to meet generally accepted criteria for preventing unauthorized access to, modification of, or release of data, with levels of security that are appropriate to the sensitivity of data. | Yes1 |
| Confidentiality | Has documented capabilities for ensuring that administrative, technical, and physical safeguards are employed to comply with applicable confidentiality, risk management, and continuous monitoring requirements for sensitive data. | Yes1 |
| Common Format | Allows datasets and metadata downloaded, accessed, or exported from the repository to be in widely used, preferably non-proprietary, formats consistent with those used in the community(ies) the repository serves. | Yes |
| Provenance | Has mechanisms in place to record the origin, chain of custody, and any modifications to submitted datasets and metadata. | Yes2 |
| Retention Policy | Provides documentation on policies for data retention within the repository. | Yes |
| Fidelity to Consent | Uses documented procedures to restrict dataset access and use to those that are consistent with participant consent and changes in consent. | Yes3 |
| Restricted Use Compliant | Uses documented procedures to communicate and enforce data use restrictions, such as preventing reidentification or redistribution to unauthorized users. | Yes3 |
| Privacy | Implements and provides documentation of measures (for example, tiered access, credentialing of data users, security safeguards against potential breaches) to protect human subjects’ data from inappropriate access. | No4 |
| Plan for Breach | Has security measures that include a response plan for detected data breaches. | Yes |
| Download Control | Controls and audits access to and download of datasets (if download is permitted). | No4 |
| Violations | Has procedures for addressing violations of terms-of-use by users and data mismanagement by the repository. | Yes |
| Request Review | Makes use of an established and transparent process for reviewing data access requests. | No4 |
1Technical and administrative measures (e.g., NetIDs login, curatorial review prior to publication) help ensure data is not modified without authorization. Furthermore, administrative mechanisms help ensure sensitive data is not made public and, where applicable, that requirements for ethical data sharing are met.
2ReDATA supports dataset versioning to retain a prior record of dataset modifications. Records of modifications from the curation process are retained internally.
3 ReDATA supports de-identified human data but requires documented consent for sharing. ReDATA's terms of use forbid users from attempting to reidentify participants.
4 ReDATA is intended for materials that are publicly releasable with unrestricted availability. It does not allow for restricting data downloads, apart from temporary embargoes.
Tools for finding repositories
Data indexers
| NAME | DESCRIPTION |
|---|---|
| Re3Data | Registry of Research Data Repositories. A worldwide index of data repositories. |
| Fairsharing | A database of data repositories and related metadata standards and policies. Also useful for identifying metadata standards for writing a DMP. |
| Google Dataset Search | Search for data across many data repositories and government websites |
| DataCite Commons | Search across all public data repositories that use DataCite DOIs |
Other resources
- Awesome Datasets is a collection of community contributed datasets
- Scientific Data's Recommended Data Repositories is a list of recommended data repositories by Nature's Scientific Data journal. All manuscripts submitted to Scientific Data must be deposited in an approved data repository.
- NIH Data Sharing Repositories is a table listing NIH-supported data repositories that accept submissions of appropriate data. Includes the Institute or Center, Repository Name, Description, Submission Policy, and How to Access the Data.
- American Heart Association Data Repositories (AHA) is a list of approved data repositories in support of the recently released AHA Open Science Policy. Subject-focused repositories, when available, are preferred over general repositories.
- Data Repositories (Open Access Directory)
Example data repositories
This is a list is a sample of some popular repositories in various fields. It is not intended to be comprehensive
| Repository | Discipline | Submission Fees | Notes |
|---|---|---|---|
| ReDATA | All | No cost | Includes data curation service for improved discoverability and reuse. Managed by the U of A Libraries. |
| ICPSR | Political & Social Sciences | No cost to deposit, fees apply for downloads unless institution is a member | U of A is a member institution. Accepts sensitive data. |
| Qualitative Data Repository | Social sciences | Fees apply (depends on size and complexity) | Provide tools & workflows focusing on qualitative data. |
| Dryad | Life sciences, medicine | Fees apply (depends on size) | Primarily accepts data associated with peer-reviewed publications. Only CC0 license allowed. |
| GenBank | Life sciences | No cost | NIH genetic sequence database, an annotated collection of all publicly available DNA sequences. Widely used. |
| Neurovault | Neurosciences | No cost | Specializes in images of the brain. |
| Vivli | Medicine | Fees apply | Specializes in clinical data sharing. Accepts sensitive data and provides access controls. |
| PANGEA | Earth & environmental sciences | No cost | Recommended by various international scientific journals |
| DataONE | Earth scineces | N/A | Well-known and trusted community providing access to earth science data from network members. |
| GBIF | Ecology & biodiversity | N/A | GBIF is a network that indexes and publishes data from partner "publishers" of which there are over 2000. U of A's Herbarium, Insect Collection, and Museum of Natural History are local partners |
| Harvard Dataverse | All | No cost | Open to researchers outside of Harvard. Accepts all disciplines but is primarily used for social sciences. |
| Zenodo | All | No cost | Widely used, operated by CERN. |
| Figshare | All | No cost | Commercial provider used by ReDATA. The public Figshare.com repository has lower deposit limits and provides no curation, deposit assistance, or continued stewardship of submissions. |