The Common Fund Data Ecosystem

Common Fund Data Ecosystem

Common Fund programs are intended to provide resources that accelerate discovery across many different biomedical research fields. Often these resources include large data sets and associated digital tools needed to mine and analyze the data. To maximize impact, these data sets and tools must be leveraged by researchers from different disciplines, with varying expertise in bioinformatics and large-scale data analysis. Additionally, these data sets must be usable together across interoperable platforms. However, current approaches to data storage, management, and analysis mean that data is often not Findable, Accessible, Interoperable, and Reusable (FAIR).

To address this challenge, the Common Fund is supporting the Common Fund Data Ecosystem (CFDE), an ongoing investment in data management infrastructure that will support past, current, and future Common Fund data sets.

The goals of the CFDE are:

  • Enhance the ability to ask scientific questions across data sets
  • Enable the uptake, reuse, and addition of Common Fund data and tools
  • Support the storage, sharing, and sustainability of Common Fund data sets
  • Provide training that maximizes scientists’ ability to upload data and use Common Fund data and other resources

The CFDE includes several integrated efforts:

Cloud computing

  1. CFDE Coordinating Center – The CFDE coordinating center will manage and organize CFDE activities, engage with participating Common Fund programs, connect with user communities, support training, develop tools and standards, and provide technical expertise. These activities will be conducted in close partnership with relevant Common Fund programs.
  2. Participating Common Fund data coordinating centers (DCCs) – DCCs will work with the CFDE Coordinating Center to understand their program’s unique requirements for data storage and analysis, adopt/adapt guidelines and best practices, share resources and tools with other DCCs, establish and enable use cases for cross-data analyses, and provide training. In January 2020, the Common Fund released an Engagement Opportunity Announcement for eligible DCCs to engage with the CFDE Coordinating Center and other DCCs to establish the CFDE. For more details, please view the Engagement Opportunity Announcement
  3. Leveraging the Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability (STRIDES) Initiative - A key component to the CFDE is making sure that data are onboarded to the cloud environment in a consistent manner. Working with the STRIDES Initiative from the NIH Office of Data Science Strategy (ODSS), the CFDE will develop guidelines to ensure data are stored and organized optimally for proper data versioning and upkeep. Working with the STRIDES Initiative also will provide favorable pricing for cloud data storage and use of Common Fund data sets. 

Using STRIDES for in-kind cloud services for new Common Fund applicants: To fully leverage the Common Fund's investment in STRIDES, Common Fund award applicants will be asked to outline the anticipated type, direct cost, and justification for activities related to cloud computing in the Budget Justification section, including, but not limited to, data storage, computing, data movement/egress (see below), professional services, training, and related activities. To foster a cloud-centric model that minimizes data movement out of the cloud, data egress fees (i.e. charges for outgoing traffic from cloud environments) should be minimized. Any requests to support egress fees incurred by large-scale data download functionalities should have strong justification. NIH will use this cost estimate to provide in-kind services via STRIDES if the application is funded and the amount requested for cloud services will not be added to the requested budget total or count toward the direct cost limit for the award. Upon award, NIH staff will coordinate with awardees to work through logistical details associated with STRIDES accounts. For more information, please see Notice of Information: Leveraging STRIDES for Cloud Computing Activities in Common Fund Awards (NOT-RM-20-009)

The ultimate goal of the CFDE is for Common Fund data to be more usable and useful both within a single program and among data sets from multiple programs. By connecting the data sets and making them more accessible, the CFDE is intended to enable novel scientific research that was not possible before, including hypothesis generation, discovery, and validation.

CFDE resources for NIH staff (requires NIH log in)

 

 

This page last reviewed on May 28, 2020