The Common Fund Data Ecosystem (CFDE) program aims to enable broad use of the data generated by its many programs by creating a data ecosystem—the management infrastructure, analytics, applications, and user interfaces needed to work within and across existing Common Fund data sets. Continuing to develop and grow this ecosystem, the CFDE has now funded two more Common Fund Data Coordinating Centers (DCCs) as well as funded additional partnership projects among DCCs to help build this functional ecosystem.
Data Coordinating Centers newly engaged with CFDE
The new DCCs joining the CFDE represent the Glycoscience and 4DN programs.
- The Glycoscience DCC will work to integrate glycoscience data from its knowledgebase GlyGen into the CFDE. GlyGen, incorporates and harmonizes data for glycans, proteins, and glycoproteins from many sources aiming to map connections between glycans and genes and proteins.
- The 4D Nucleome (4DN) DCC further expands the CFDE by adding multimodal data including sequencing-based and imaging-data, that aim to understand how three-dimensional chromosomal interactions affect long-range gene regulation, chromosomal dynamics under perturbation, and non-coding variants in the genome.
The addition of these DCCs and datasets will help expand CFDE by contributing a wealth of information about nuclear organization and about roles that glycans play in organisms while increasing the diversity of data types included within CFDE. They join eight DCCs including Extracellular RNA Communication (ExRNA), Gabriella Miller Kids First (Kids First), Genotype-Tissue Expression (GTEx), The Human BioMolecular Atlas Program (HuBMAP), Illuminating the Druggable Genome (IDG), Library of Integrated Network-based Cellular Signatures (LINCS), Metabolomics, and Stimulating Peripheral Activity to Relieve Conditions (SPARC) programs that were initially funded in FY20. Together with the CFDE-Coordination Center (CFDE-CC), these awardee teams are continuing to advance development of processes for harmonizing basic metadata elements, providing data sets for the CFDE Portal, forming a culture of sharing insight and knowledge across DCCs, and contributing to CFDE-wide training and outreach efforts.
New Partnerships among Data Coordinating Centers
Six new DCC partnership projects have also been funded by the CFDE. These collaborative projects will develop approaches and tools to harmonize data and workflows from multiple Common Fund programs enabling cross-dataset analysis. These partnerships are meant to enhance DCC-DCC interactions. In addition, these partnerships aim to demonstrate the utility of their data integration tools and approaches for CF datasets to the broader scientific community. These projects and DCCs include:
- Anatomical Interoperation of Resources: Partnering DCCs: SPARC, HuBMAP
This project will compare the spatial distribution of gene expression in the heart across different developmental stages, health, and disease states. This is critical to improve understanding cardiac pathologies. This will involve data from the SPARC and HuBMAP program and registering tissue architecture, neural and/or vascular tracings, RNA-seq, and other data types against a common coordinate cardiac spatial scaffold.
- Gene Burden Testing: Partnering DCCs: Kids First, HuBMAP
This project will enhance the capabilities of the HuBMAP Knowledge Graph. The aim is to enable HuBMAP and Kids First workflows to run seamlessly on both HuBMAP and Kids First infrastructure and establish standards and solutions that point the way to broader workflow interoperability within the CFDE. The Knowledge Graph will enable finding and accessing the data sets relevant to the queries such as “do children with congenital disabilities have an overabundance of variants in genes that are expressed in specific cell types in tissues of interest?”
- CFDE Gene Centric Prototype Dashboard: Partnering DCCs: ExRNA, Glycoscience, GTEx, HuBMAP, IDG, Kids First, LINCS, and Metabolomics
This project will develop methods to harmonize gene, protein, and RNA identifiers and generate a cloud workspace that pools gene information from DCCs for use cases. This will involve development of standards for gene landing pages and gene centered API and development of a prototype dashboard for gene cards from the DCCs and other resources.
- CLinical Observations and Vocabularies (CLOVoc): Partnering DCCs: Kids First, Metabolomics, SPARC
This project will build FAIR metadata about human clinical data and facilitate interoperability amongst these datasets. This effort will develop minimal clinical metadata framework and APIs to facilitate the discoverability/interoperability and develop FHIR profiles of clinical metadata across partnering DCCs. The goals are to improve the ability to query across CF datasets for a given disease/phenotype or a clinical profile and integrate different datasets so that they are interoperable and reusable for secondary analyses.
- Aggregation and Sharing of Variant-centric Information: Partnering DCCs: ExRNA, GTEx, and Kids First
This project aims to make CFDE variant data FAIR by establishing a framework to derive information about specific variants and regulatory elements from the high-volume -omics profiling datasets to interpret such non-coding variants.
- Toxicology Screening Pipeline: Partnering DCCs: IDG, Kids First, LINCS, and SPARC
This project will develop a pipeline infrastructure that will tag CFDE Portal records for genes, their products, and small-molecule xenobiotics with labels of toxicity potential for reproductive and developmental processes.
Learn more about these CFDE awards by visiting the Funded Research page.
Innovative collaborations will create useful tools for scientific discovery
The Common Fund Data Ecosystem (CFDE) aims to enable new ways of doing science by creating an ecosystem—the data management infrastructure, analytics, applications, and user interfaces needed to work within and across existing Common Fund data sets. The CFDE took a major step toward creating this resource by launching a set of collaborative projects that bring together eight Common Fund Data Coordinating Centers (DCCs) to help build this functional ecosystem for answering important biological questions, such as uncovering new molecular pathways and illuminating disease mechanisms.
The Common Fund DCCs will contribute a wealth of diverse data sets, spanning basic biology to clinical research, and will work towards making their data more useful alone and in combination with other data sets. The participating DCCs include Extracellular RNA Communication (ExRNA), Gabriella Miller Kids First (Kids First), Genotype-Tissue Expression (GTEx), The Human BioMolecular Atlas Program (HuBMAP), Illuminating the Druggable Genome (IDG), Library of Integrated Network-based Cellular Signatures (LINCS), Metabolomics, and Stimulating Peripheral Activity to Relieve Conditions (SPARC) programs. Their collaborative projects will tackle important challenges in biomedical research and human health, including (but not limited to):
- Innovative strategies for data-driven treatment planning—coupling drug and small molecule predictions with patient gene activity data to uncover key molecular pathways and help with developing effective treatment strategies, predicting drug responses, identifying the best candidate drugs for specific patients, and tracking disease progression and recovery. Participating DCCs: GTEx, IDG, Kids First, LINCS, Metabolomics
- New drug targets for pediatric cancer treatments—identifying new potential therapeutic targets for specific types of pediatric cancers by comparing the gene activity differences between tumors and healthy organ tissue. Participating DCCs: GTEx, Kids First, LINCS
- Novel insights into complex conditions—generating multi-layered organ maps that will incorporate genetic mutations, structural birth defects, and gene activity changes during development, to create a powerful tool for studying complex conditions like Down syndrome. Participating DCCs: ExRNA, HuBMAP, Kids First, SPARC
- Solutions for working with data in the cloud—exploring new ways to combine data sets and discover solutions for working across independent cloud-based platforms. Participating DCCs: ExRNA, GTEx, HuBMAP, IDG, Kids First, LINCS, Metabolomics, SPARC
Demonstrating the value of these data sets, particularly in combination, will help the research community see what kinds of new research questions can be asked of and answered by the data. CFDE will also make the data more accessible through a cloud-based public web portal. As these exciting projects begin, they hold the potential for opening new doors to scientific discovery and informing innovative approaches to improving human health.
Learn more about these CFDE engagement awards by visiting the Funded Research page.
This page last reviewed on November 16, 2021