Enhancing the Utility of Common Fund Data Sets

Big data and artificial intelligence istock/RyzhiThe Common Fund has generated valuable and widely available data sets by multiple programs, containing a variety of multi-dimensional molecular and phenotypic data from several organisms including mouse and human. As various Common Fund data resources have become available, investigators frequently state that they are unaware of the data and/or of the relevance that the data may have for their research interests. In addition, some report difficulty using or exploring the data because the data portals are non-intuitive to use and/or the data are otherwise difficult to navigate.

To maximize the impact of these data, engage a broader community of end-users for wider adoption of these data sets, and obtain feedback to enhance the data portals, the Common Fund is supporting administrative supplements that will further the use of Common Fund data sets. These supplements will enable novel and compelling biological questions to be formulated and addressed, and/or to generate cross-cutting hypotheses for future research. Supplement recipients will also provide feedback on utility of the Common Fund data resources.

The following supplements have been awarded:

  • Hugo Bellen (Baylor College of Medicine, 3U54NS093793-05S1)- This supplement will build upon Dr. Bellen’s current work on the Model Organism Aggregated Resources for Rare Variant ExpLoration (MARRVEL) by adding data from the Knockout Mouse Phenotyping (KOMP2) and Illuminating the Druggable Genome (IDG) programs. MARRVEL is a web-based tool that provides information from both humans and various animal models on genetic variations that occur in different diseases. By adding additional Common Fund datasets to the MARRVEL tool, Dr. Bellen will create a resource that not only expands our ability to search for potential drivers of disease, but also allows for research into potential drugs that may help treat these illnesses.
  • Stephen Burley (Rutgers University, 3R01GM133198-01S1) – This supplement proposes to integrate the Protein Data Bank, a large, open-access resource for information on protein structures, with five Common Fund data sets (4D Nucleome, Genotype-Tissue Expression (GTEx), PHAROS (Illuminating the Druggable Genome), Metabolomics, and Knockout Mouse Phenotyping Program (KOMP2)). The proposed integration will enhance the utility of the Common Fund data sets by providing users access to protein structure information that had not previously been connected to the Common Fund data. Ultimately, this integration is expected to enable investigation of novel biological questions, and promote a more complete understanding of human health and disease.
  • Robert Cornell (University of Iowa, 3R01AR062547-04S1) – This supplement will leverage data from several Common Fund data sets to explore how genes are regulated during melanocyte stem cell generation and maintenance. Melanocyte stem cells play a role in skin and hair pigmentation and are involved in several different skin disorders, including melanoma. This project proposes to integrate Knockout Mouse Phenotyping Program (KOMP2) data on mice with pigmentation defects, 4D Nucleome data from melanoma cell lines, and Genotype-Tissue Expression (GTEx) data on the relationship between gene variants and gene expression levels of melanocyte-related genes. Working across these data sets will lead to a better understanding of the complex regulation of melanocytes and melanoma.
  • Trey Ideker and Nevan Krogan (University of California San Diego, 3U54CA209891-03S1) – This supplement aims to use data sets from the Library of Integrated Network-based Cellular Signatures and PHAROS (Illuminating the Druggable Genome) to develop artificial intelligence techniques to design novel molecules predicted to inhibit cancer protein targets. Molecules identified through this supplement would then be generated and tested in future research in cancer cell lines with genetic changes that are similar to those seen in patients. This research could be a first step towards developing a new approach to designing potent cancer treatments using artificial intelligence.
  • Jeffrey O’Connell (University of Maryland Baltimore, 3U01HL137181-03S1) - This supplement will use data from three Common Fund Datasets including the Genotype-Tissue Expression (GTEx), the Knockout Mouse Phenotyping (KOMP2), and the Library of Integrated Network-based Cellular Signatures (LINCS) programs. These datasets will be integrated into the web-based “Omics Analysis, Search, and Information System” (OASIS) to provide automated integration of Common Fund datasets with end-user generated association results. This new capability will automatically search and highlight connections between a multitude of datasets, all with very different types of biomedical information such as genomics, metabolomics, and proteomics. Researchers with a variety of specialties (e.g. Biologists, Epidemiologists, Physicians, Clinicians) will be exposed to the power of existing Common Fund data sets and will benefit from the automated integration provided by OASIS.
  • Douglas Phanstiel (University of North Carolina at Chapel Hill, 3R35GM128645-02S1) – This supplement aims to use data from the 4D Nucleome program to develop computational tools to predict pairs of genes and enhancers, which are regulatory segments of DNA that help control when genes are turned on or off. Enhancers may be located far away from the genes they regulate, making it challenging to identify which genes are the targets of a given enhancer. However, 4D Nucleome time-course data on the temporal patterns of enhancer strength, structural conformation of genetic material and associated proteins, and gene expression will be used to develop new computational approaches to predict these gene-enhancer pairs, leading to a better understanding of how genes are regulated over time.
  • Pinaki Sarder (State University of New York at Buffalo, 3R01DK114485-02S1) – This supplement will use kidney tissue samples collected as part of the Genotype-Tissue Expression (GTEx) program to help develop a computational image analysis method for improved diagnosis of diabetic nephropathy. The large number of healthy kidney tissue samples available through GTEx will add to the samples already collected by Dr. Sarder, improving the computational method and leading to better diagnosis and projection of disease trajectory in patients with diabetic neuropathy.
  • Edwin Silverman (Brigham and Women’s Hospital, 3U01HL089856-13S1) - This supplement will combine information from the Genotype-Tissue Expression (GTEx) and Illuminating the Druggable Genome (IDG) programs to deepen our understanding of chronic obstructive pulmonary disease (COPD), which is the third leading cause of death in the developed world. By using the genetic information present in the GTEX dataset, Dr. Silverman aims to identify new genetic changes that may be linked to COPD. Using this information, the study will then look for new potential drugs to help treat COPD by searching the IDG database Pharos.
  • Ansley Stanfill (University of Tennessee Health Science Center, 3R01NR017407-02S1) – This supplement will build upon Dr. Stanfill’s current study of aneurysmal subarachnoid hemorrhage (aSAH) in Caucasian and African American cohorts. Genotype-Tissue Expression (GTEx) data will be used to examine the effects of identified genetic variants on brain tissue gene expression in neurotransmitter pathways that are predictive of disability following aSAH. These data may provide insight into the observed disparities in outcomes after aSAH between Caucasians and African Americans. Additionally, GTEx data will be used to explore whether similar gene expression changes are present in the blood, potentially identifying a surrogate marker for brain gene expression that could inform personalized treatment interventions.

This page last reviewed on September 3, 2020