NIH Data Commons Pilot Phase Explores Using the Cloud to Access and Share FAIR Biomedical Big Data

Publié le 9 novembre 2017, par Thérèse Hameau

What is a Data Commons ?

A data commons is a shared virtual space where scientists can work with the digital objects of biomedical research such as data and analytical tools. The NIH Data Commons Pilot will test ways to store, access, and share biomedical data and associated tools in the cloud so that they are FAIR. The goal of the NIH Data Commons is to accelerate new biomedical discoveries by providing a cloud-based platform where investigators can store, share, access, and compute on digital objects (data, software, etc.) generated from biomedical research and perform novel scientific research including hypothesis generation, discovery, and validation.

How will the NIH Data Commons be implemented ?

The NIH Data Commons will be implemented initially as a Pilot Phase in which three high-value datasets will serve as test cases for the principles, policies, processes, and architectures that need to be developed. NIH expects the Pilot Phase will occur over 4 years. The test case datasets include the Genotype-Tissue Expression (GTEx) and the Trans-Omics for Precision Medicine (TOPMed) datasets, as well as several Model Organism Databases (MODs) that are working as a consortium to create an integrated resource known as the Alliance of Genome Resources (link is external). Test case dataset selection derives from the high value of these data to many users in the biomedical research community as well as from the diversity of the data they contain. However, it is envisioned the Data Commons will expand to include other data resources once this pilot phase has achieved its primary objectives.

A multidisciplinary NIH Data Commons Pilot Phase Consortium (DCPPC) including data scientists, computer scientists, information technology engineers, cloud service providers, and biomedical researchers will be established in December 2017. This group will be charged with setting community-endorsed processes and metrics for FAIR data management and will develop a consortium roadmap for building the Data Commons. External consultants from industry and academia, volunteering their time and expertise, will be engaged by the NIH to help ensure the Data Commons is maximally useful to users with varying degrees of expertise. Safeguarding the security of data through state-of-the-art user authentication and authorization protocols will be a key focus ; interoperability with existing data structures such as the NCI Genome Data Commons, AHA Precision Medicine platform, and the European Data Commons, will also be emphasized.