The Common Fund Data Ecosystem
Common Fund programs are intended to provide resources that accelerate discovery across many different biomedical research fields. Often these resources include large data sets and associated digital tools needed to mine and analyze the data. To maximize impact, these data sets and tools must be leveraged by researchers from different disciplines, with varying expertise in bioinformatics and large-scale data analysis. Additionally, these data sets must be usable together across interoperable platforms. However, current approaches to data storage, management, and analysis mean that data is often not Findable, Accessible, Interoperable, and Reusable (FAIR).
To address this challenge, the Common Fund is supporting the Common Fund Data Ecosystem (CFDE), an ongoing investment in data management infrastructure that will support past, current, and future Common Fund data sets.
The goals of the CFDE are:
- Enhance the ability to ask scientific questions across data sets
- Enable the uptake, reuse, and addition of Common Fund data and tools
- Support the storage, sharing, and sustainability of Common Fund data sets
- Provide training that maximizes scientists’ ability to upload data and use Common Fund data and other resources
The CFDE includes several integrated efforts:
- CFDE Coordinating Center – The CFDE coordinating center will manage and organize CFDE activities, engage with participating Common Fund programs, connect with user communities, support training, develop tools and standards, and provide technical expertise. These activities will be conducted in close partnership with relevant Common Fund programs.
- Participating Common Fund data coordinating centers (DCCs) – DCCs will work with the CFDE Coordinating Center to understand their program’s unique requirements for data storage and analysis, adopt/adapt guidelines and best practices, share resources and tools with other DCCs, establish and enable use cases for cross-data analyses, and provide training. In January 2020, the Common Fund released an Engagement Opportunity Announcement for eligible DCCs to engage with the CFDE Coordinating Center and other DCCs to establish the CFDE. For more details, please view the Engagement Opportunity Announcement.
- Leveraging the Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability (STRIDES) Initiative - A key component to the CFDE is making sure that data are onboarded to the cloud environment in a consistent manner. Working with the STRIDES Initiative from the NIH Office of Data Science Strategy (ODSS), the CFDE will develop guidelines to ensure data are stored and organized optimally for proper data versioning and upkeep. Working with the STRIDES Initiative also will provide favorable pricing for cloud data storage and use of Common Fund data sets.
Using STRIDES for in-kind cloud services for new Common Fund applicants: To fully leverage the Common Fund's investment in STRIDES, Common Fund award applicants will be asked to outline the anticipated type, direct cost, and justification for activities related to cloud computing in the Budget Justification section, including, but not limited to, data storage, computing, data movement/egress (see below), professional services, training, and related activities. To foster a cloud-centric model that minimizes data movement out of the cloud, data egress fees (i.e. charges for outgoing traffic from cloud environments) should be minimized. Any requests to support egress fees incurred by large-scale data download functionalities should have strong justification. NIH will use this cost estimate to provide in-kind services via STRIDES if the application is funded and the amount requested for cloud services will not be added to the requested budget total or count toward the direct cost limit for the award. Upon award, NIH staff will coordinate with awardees to work through logistical details associated with STRIDES accounts. For more information, please see Notice of Information: Leveraging STRIDES for Cloud Computing Activities in Common Fund Awards (NOT-RM-20-009).
The ultimate goal of the CFDE is for Common Fund data to be more usable and useful both within a single program and among data sets from multiple programs. By connecting the data sets and making them more accessible, the CFDE is intended to enable novel scientific research that was not possible before, including hypothesis generation, discovery, and validation.
CFDE resources for NIH staff (requires NIH log in)
The way that the biomedical research interacts with digital data is changing; no longer can large biomedical data sets be stored or analyzed using local computers and servers. The CFDE will allow larger amounts of data to be stored and will provide the framework for researchers to analyze data simultaneously from different and diverse datasets.
The CFDE will leverage deliverables and lessons learned from the New Models of Data Stewardship program to enhance the utility of Common Fund data sets individually and in combination. This includes a continued effort to improve the FAIRness of Common Fund data sets. The Common Fund recognizes that each data-centric program has unique and common needs. The CFDE coordinating center is engaging with Common Fund programs to identify opportunities for providing common infrastructure, resources, and sharing of best practices to all of Common Fund’s data-centric programs, such as common metadata standards, authentication and authorization, and a portal to locate and search Common Fund data.
The CFDE coordinating center is developing partnerships with nine Common Fund programs: Gabriella Miller Kids First Pediatric Research (Kids First), Genotype Tissue Expression (GTEx), Library of Integrated Network-based Cellular Signatures (LINCS), Human Microbiome Project (HMP), Human BioMolecular Atlas Program (HuBMAP), Molecular Transducers of Physical Activity Consortium (MoTrPAC), Stimulating Peripheral Activity to Relieve Conditions (SPARC), and Metabolomics. These programs offer different perspectives to enable a deeper understanding of the issues around using and integrating diverse data types, identify common needs for Common Fund programs, and collaborate across programs to enhance data utility. Applying best practices and lessons learned from partnerships with an initial set of programs, the CFDE coordinating center is now expanding its activities to engage with additional Common Fund programs as the CFDE is established.
More information can be found in presentations from the May NIH Council of Councils meeting where the CFDE and the ODSS efforts were discussed, as well as the September Council of Councils meeting where the concept for the upcoming DCC funding opportunity to establish the CFDE was approved.
This page last reviewed on May 28, 2020