Sign Up

1111 Engineering Drive, Boulder, CO 80309

View map

Assembling Representative Clinical Trial Cohorts with AI Approximate Similarity Search

Abstract: Healthcare providers make treatment decisions by blending their experience, patient concerns, and data from clinical trials. While trials are the gold standard for determining the safety and effectiveness of medical interventions, health disparities arise because it is challenging to apply the results of these studies to patients who differ from the subjects within the study cohort. Ding et al. (Nature, 2023) quantified this effect by showing that the less genetically similar an individual is to those in the reference cohort, the less accurate their predicted disease risk is. Their results also suggested that the straightforward approach of replicating a study for different groups (e.g., by genetic ancestry) provided only a marginal improvement for those who fit neatly into a group and none for those who don't. Mitigating health disparities from inadequate trial cohort representation may require dynamically creating reference cohorts for individual patients. While university biobanks are approaching the size and diversity required to fulfill most patient needs, traditional indexing and exact search methods do not scale. We propose leveraging AI-power approximate similarity search architectures built initially for image search to find cohorts of genetically similar individuals dynamically. These cohorts can, for example, be used to replicate clinical studies to generate evidence that inform representative treatment recommendations for all patients, especially those from populations that have traditionally been marginalized in healthcare research. These search architectures use numerical representations known as embeddings to identify the nearest neighbors of an object. Similar objects are represented by embeddings that are close in the numerical space. Our Genotype Similarity Search (GenoSiS) method uses a siamese neural network to learn genetic embeddings that produce representative cohorts in less that a second among the families in deCODE pedigrees, across populations among the International Genome Sample Resource's cohorts, and in the 74,000 samples Colorado Center for Personalized Medicine biobank. While existing methods offer static clusters and NxN similarity metrics, GenoSiS avoids recomputing the significant portions of the matrix for additional samples and can be extended to include non-genetic factors critical to building representative cohorts across different diseases. The software is available for download at https://github.com/kristen-schneider/precision-medicine.

Bio: Ryan is an Assistant Professor at the BioFrontiers Institute and Computer Science Department in Boulder. He got his PhD in Computer Science from the University of Virginia and did a postdoc at the University of Utah in the Human Genetics Department. His research focuses on developing methods to explore large-scale genetic datasets and is particularly interested in structural variation and rare diseases. He is also committed to improving the reliability and reproducibility of scientific software and teaches a Software Engineering for Scientists class every fall.

Please join us in ECCR 265 on on Zoom at: https://cuboulder.zoom.us/j/91008309605

 

  • Thomas R Kunstman
  • Thomas Kunstman

2 people are interested in this event

User Activity

No recent activity