Awardees
The Institute for Precision Health (IPH) Pilot Studies award is designed to support multidisciplinary research projects that address clinical and research needs using data, machine learning and artificial intelligence. Award winners focus on improving human health and well-being; and fostering advances in personalized health and patient-specific care through precision health.
Spatial Transcriptomics of Vulvar Lichen Sclerosus
Read Abstract
Lichen sclerosus (LS) is a chronic, inflammatory condition that most frequently affects the vulva (the skin surrounding the opening of the vagina), leading to itch and pain. It significantly impacts quality of life and can lead to irreversible scarring and the development of skin cancer. There are no curative options for vulvar LS (VLS) and no FDA-approved therapies for this condition. Treatment usually consists of lifelong maintenance therapy with a high-potency topical steroid to prevent scarring and decrease the risk of skin cancer. Little is known about the events at a cellular and molecular level that contribute to development of LS. Our prior work, evaluating human tissue samples of LS at a single-cell level using a technology known as single-cell RNA sequencing, found altered gene expression in several different types of cells, particularly the epidermal keratinocytes that normally form a protective barrier on our body’s outer surface, in LS skin. While keratinocytes are known to play a role in other inflammatory skin disease processes, such as a psoriasis, their importance in LS pathogenesis has not been appreciated. We are interested in exploring the role of keratinocytes in LS and whether they signal to other cells to cause inflammation, or they receive signals from other cells in response to inflammation in LS. To understand how these keratinocytes interact with potential signaling partners, we propose to use a high-resolution spatial technology, known as Xenium. This technology will allow us to measure gene activity in tissue samples by mapping where activity is occurring at the level of single cells and compare gene expression across tissue regions. This will allow us to evaluate where in proximity abnormal keratinocytes are to other cells and how they may interact with other cells to form signaling networks. We hypothesize that this analysis of LS skin will reveal signaling interactions between keratinocytes and immune cells (cells that protect the skin from infection but can cause inflammation), fibroblasts (cells that form the fibrous material in the skin), and melanocytes (cells in skin that produce pigment). Such findings will be critical to further our understanding of the pathogenesis of LS. Knowing the events that occur in LS at a molecular level and how keratinocytes are involved in LS will provide crucial information necessary for the identification of earlier diagnostic markers and improved skin-directed therapies.
Liquid Biopsy for the Early Detection of Early-Onset Colorectal Cancers (EOCRC), with a Particular Focus on Underrepresented Populations
Read Abstract
This research project addresses the urgent health issue of early-onset colorectal cancer (EOCRC), which is colorectal cancer occurring in individuals under 50 years old. EOCRC is increasingly becoming a major concern, expected to be the leading cause of cancer-related deaths among people aged 20 to 49 by 2030, especially in underrepresented populations like Hispanics and African-Americans. The project’s primary aim is to develop a new blood-based liquid biopsy test that is effective in detecting EOCRC across all racial groups. This is crucial because current colorectal cancer screening tests are mainly designed for late-onset cases (those occurring in individuals aged 50 and above) and are less effective for EOCRC. Our research hypothesis is based on preliminary data suggesting significant differences in the epigenetic landscape (genetic markers that influence gene activity without altering the DNA sequence) between EOCRC and late-onset colorectal cancer, with these differences further varying across racial groups. The research method involves collecting blood samples from a diverse group of EOCRC patients at UC Irvine Health and analyzing the DNA in these samples using Enzymatic Methyl-seq (EM-seq). This technique allows us to study specific genetic changes, focusing on DNA methylation and nucleosome occupancy, which are crucial in understanding how EOCRC develops in different racial groups. The data from these analyses will be integrated into a comprehensive model called Multimodal Epigenetic Sequencing Analysis (MESA) for early detection of EOCRC. To ensure the test’s accuracy and applicability across different populations, we plan to conduct independent validations using patient groups from various geographic areas and compare the results with traditional colonoscopy methods. The potential impact of this research is significant. If successful, it will result in the first effective liquid biopsy test for EOCRC that is suitable for patients of all racial backgrounds. This innovation could dramatically improve early detection of colorectal cancer, particularly in younger individuals and underrepresented racial groups who are currently at a higher risk. Early detection is key to increasing survival rates and reducing the burden of this disease. By providing a reliable, non-invasive method for early cancer detection, this test could save many lives and represent a significant advancement in personalized healthcare and cancer management, ensuring that all communities have access to effective screening tools.
Data-Driven Discovery of Clinically Relevant Subgroups in Alzheimer’s Disease
Read Abstract
Alzheimer’s disease (AD), the most prevalent neurodegenerative disorder in late life, presents a significant societal and economic challenge, particularly as the population ages. Despite the escalating burden, developing effective drugs for AD has proven exceptionally challenging, marked by a high failure rate in clinical trials and only modest efficacy in recently FDA-approved therapies. A challenge lies in the biological and phenotypic heterogeneity observed among AD patients. While abundant data have been collected across the AD spectrum, the understanding of why some patients exhibit rapid clinical progression while others remain stable is still lacking. Traditionally, selecting participants for therapeutic interventions rely on strict inclusion and exclusion criteria to reduce heterogeneity and target specific patient populations. However, a significant proportion of individuals selected using this approach do not show clinical progression in the short to mid-term. This phenomenon negatively impacts research trial outcomes and, in clinical settings, may subject patients to unnecessary risks. Identifying disease subtypes using data-driven methodologies might provide a solution to this challenge. In this project, the focus is on aggregating and harmonizing data from the placebo arms of eight randomized clinical trials of AD, involving 4,139 participants. This high-dimensional database serves as the foundation for developing and training machine learning models. The overarching goal is to classify patients based on shared clinical and pathological disease-progression patterns, with the validation of these models in independent samples. For this purpose, we will initially aggregate and harmonize multimodal data from multiple trails into a machine-learning-friendly format (Aim1). Subsequently, we will use machine learning models to identify homogeneous subgroups within the developed database (Aim 2). Finally, we will evaluate whether focusing on subsamples from the previous steps could improve the design of future clinical trial design, thereby increasing the likelihood of success and reducing costs. Results from this project not only can impact the design of future clinical trials but also can improve clinical decision-making in real-world settings by identifying patients most likely to benefit from therapies that are becoming widely available.
An LLM-based Interactive and Personalized Coaching for Chronic Health Management
Read Abstract
This study introduces an innovative framework for chatbots, which are used in healthcare for engaging and coaching patients. Currently, existing chatbots, such as ChatGPT, mainly focus on conversation and lack the ability to solve complex problems or analyze personalized data. They cannot integrate individual health data like vital signs or Electronic Health Records, resulting in generic responses that may not meet specific health needs. The proposed framework aims to address this by empowering Conversational Health Agents (CHA) with a Large Language Model (LLM)-based problem-solving core, enabling them to provide personalized responses to health-related queries. This will be achieved by integrating the CHA framework with the Syntropy platform through IPH for secure, real-time data and analytical services. The Syntropy platform will be used to host health data from over 5,000 in-home patients, collected over a year via iHealth Labs devices. We utilize Syntropy data ingestion and analysis services to tailor responses in the CHA. The framework’s effectiveness will be tested through a case study focusing on coaching for diabetes management. This development has the potential to fundamentally transform healthcare by providing a cost-effective, accessible alternative for chronic condition management, enhancing healthcare access for all, and offering a foundational platform for future health data analysis and personalized coaching. The release of the CHA-Syntropy platform for IPH services and the dataset for the UCI community research are also anticipated as significant impacts of this project.
Reducing Information Inequality in Neurologic Rehabilitation through AI
Read Abstract
After a stroke, finding proper information about rehabilitation can be difficult for many people. They often struggle to get answers to their high-value questions. This lack of information can affect their health and quality of life. Our goal is to develop innovative technology that helps stroke survivors access clear and helpful information about their recovery. To do so, we will build a chatbot that understands and answers the most important questions stroke survivors commonly have about stroke rehabilitation. It will offer accurate and personalized information in a way that is easy to understand. Our project has three main steps:
- Finding Important Questions: We will talk to stroke survivors to discover the most important questions they have about their recovery. We will do this through group discussions in both English and Spanish. In addition, we will ask rehabilitation therapists what questions they think their patients should be asking.
- Creating the Chatbot: We will build a chatbot that uses large language models (LLMs) – a type of artificial intelligence used to simulate how humans talk. We will train this chatbot with expert-approved information and design it to give personalized answers. We will ensure that the chatbot delivers correct and relevant information, as judged by experienced rehabilitation providers.
- Testing the Chatbot: We will ask a group of stroke survivors to use the chatbot for four weeks to learn about stroke and rehabilitation. We will then check if the chatbot helped them to improve their understanding of health information and their health-related quality of life.
We believe the chatbot will help stroke survivors access better information, especially if they are in a situation where they have limited chances to interact with skilled rehabilitation providers. It will be better than searching the web or using current AI chatbots, which are not well-trained for rehabilitation questions, and it will help them better understand their condition and make informed decisions about their health. This should lead to a better health-related quality of life. If the chatbot shows promise in this pilot study, we will refine and test it more widely. Our ultimate goal is to reduce the gap in information access for people recovering from a stroke, especially those in underserved areas.
In summary, this project is about using new AI technology to create a helpful tool for people recovering from a stroke. This tool will provide them with easy-to-understand, accurate information about their rehabilitation. We hope this will improve the ability to access, understand and apply rehabilitation information.
Machine Learning in the Detection of Severe Tricuspid Regurgitation
Read Abstract
Tricuspid regurgitation (TR) is a heart condition affecting 85% of the population, causing blood to flow backward through the tricuspid valve. As TR progresses, it can lead to heart failure, with higher mortality linked to the severity of TR. New transcatheter techniques offer a promising alternative to surgery, but their full potential is hindered by late-stage diagnoses and potential underdiagnosis.
Our goal is to create a machine learning algorithm that can quickly and accurately detect severe TR using echocardiography, a common diagnostic method that involves analyzing 13 different factors. We believe that machine learning can efficiently identify complex conditions like TR, building on successful pipelines used for other cardiac conditions such as aortic stenosis and heart failure with preserved ejection fraction.
Our first aim is to understand the true prevalence of severe TR at our institution. We will reevaluate echocardiograms previously identified with moderate, moderate-severe and severe TR. Two certified echocardiographers, unaware of the original diagnosis, will reevaluate the studies using specific guidelines. We will measure the agreement between the original and reevaluated studies using Cohen’s kappa coefficient. We expect to find a significant difference between the original and reevaluated studies, with higher mortality linked to studies regraded as severe.
Our second aim is to develop a machine learning algorithm to automatically detect severe TR in echocardiograms. We will train the algorithm using unsupervised learning, allowing it to learn from the data without explicit instructions. The algorithm will then predict the severity of TR: severe, not severe or nondiagnostic. We will evaluate the algorithm’s performance by measuring its sensitivity, specificity and overall accuracy. Additionally, we will create activation maps to highlight the most crucial features of the diagnosis. Finally, we will test the algorithm on previously graded cases for comparison with the reevaluations.
Successfully developing this algorithm could lead to earlier identification of severe TR, increasing access to transcatheter repair for those at high surgical risk. Moreover, it could extend treatment to areas lacking comprehensive echocardiography resources and enable effective tracking of outcomes post-repair. Our work contributes to the broader field of automated cardiovascular screening, marking a crucial step in understanding the complex nature of TR and improving patient outcomes.
Advancing Early Alzheimer’s Disease Detection Using Multi-Modal Deep Learning Models Across the Adult Age Spectrum
Read Abstract
As of now, approximately 50 million individuals worldwide are living with dementia, a number projected to triple in the next 30 years. Among the elderly, cognitive impairment, particularly Alzheimer’s Disease (AD), poses significant challenges for patients, families, society and healthcare systems. Therefore, early detection of cognitive impairment holds immense potential for enhancing patient treatment through timely interventions.
Aim 1: Identify Language and Acoustic Markers Associated with Cognition using Diverse Cohorts
The primary goal is to employ deep learning approaches to identify language markers consistently linked with dementia and mild cognitive impairment (MCI) within three cohorts: The 90+ Study, ADRC and Down Syndrome.
- 1a) Utilize a substantial sample of 90+ participants with longitudinal video data (n=1,300) to identify language markers using spontaneous speech associated with MCI, dementia and neuropsychological measures.
- 1b) Analyze language samples from the ADRC cohort (n=250 and growing) to identify cross-sectional associations and explore unique markers revealed by two speech elicitation methods-semi-structured interviews vs. picture descriptions.
Aim 2: Identify Language and Acoustic Markers Associated with AD Biomarkers
Leverage AD biomarkers from ADRC cohorts to explore associations between specific language markers and AD biomarkers. The analysis will involve evaluating the data from cohort participants, considering both biomarkers and language markers.
Aim 3: Determine Utility of Deep Learning based Language Markers for Diagnosis in Down Syndrome
We aim to test the hypothesis that speech and language markers associated with cross-sectional cognitive outcomes can be identified in participants with Down Syndrome.
For Aim 1, a two-sided multi-modal architecture will be employed, utilizing a BiLSTM network for each training side on acoustic and linguistic features extracted from processed speech samples. Pretrained language models for speech recognition and feature extraction will be employed in the proposed model. A customized Large Language Model (LLM) will be trained with participants’ data which could be a replacement for current language models. The proposed deep learning-based model has been tested on the preliminary data (40 CIND and 50 NC from the 90+ study) showed ~98% AUC for cognitive impairment detection.
For Aim 2, association analysis will be conducted using CSF measurements, amyloid PET and MoCA scores in ADRC cohorts, evaluating possible associations with biomarkers and digital language markers through regression models. In Aim 3, association analysis will be performed in the Down Syndrome cohort. The model and customized LLM will undergo fine-tuning, and associations’ outcomes and the identified language makers in the model in ADRC and 90+ Study data will be validated using the Down Syndrome cohort.
This project’s outcomes are expected to revolutionize early detection of Alzheimer’s disease across diverse cohorts using deep learning-based approaches. Finding associations among language markers with cognitive function and AD biomarkers promises a more accessible approach to identifying cognitive conditions. The anticipated deep learning-based model employing a customized LLM holds the potential to enhance treatment outcomes, improve patients’ quality of life, and enable earlier detection in AD cases contributing to advancements in Alzheimer’s research and diagnosis
Machine Learning-Driven Approaches for Detection of Deleterious Missense Variants in Rare Diseases from Exome Sequencing
Read Abstract
This research project focuses on enhancing the diagnostic yield of genetic testing for various diseases, particularly neuromuscular disorders, by developing and integrating advanced computational models into the diagnostic process. Despite the advancements in genome sequencing, a significant number of patients remain undiagnosed due to the high volume of genetic variants of unknown significance. This project aims to address this challenge by improving the identification and interpretation of potentially deleterious missense variants.
The first aim involves creating a new protein language model training scheme that utilizes both cross-species and in-specie evolutionary data. This approach is designed to overcome the limitations of current models and improve the accuracy of predicting deleterious missense variants. By leveraging evolutionary data, the model will be better equipped to identify harmful genetic changes, thereby enhancing disease diagnostics.
The second aim focuses on the comprehensive evaluation and integration of state-of-the-art protein language models. This involves benchmarking these tools to understand their strengths and weaknesses and then combining them to create an ensemble model. This model will provide a single, calibrated score for variant pathogenicity, aligning with clinical guidelines and standards. Calibration is critical to ensure the precision of these models and their usability in clinical settings.
The final aim is to apply these models to genome sequencing patient data from UCI hospital, specifically focusing on individuals with undiagnosed neuromuscular diseases. This involves prioritizing potentially deleterious variants from exome sequencing data and validating these predictions using data from large neuro-disease consortia. The project will integrate multi-omics data, such as transcriptomics, proteomics, and chromatin accessibility, to understand the functional impact of these variants. This will allow for a more comprehensive interpretation of the genetic underpinnings of neuromuscular and other diseases.
The project is significant in its potential to enhance genetic disease diagnostics. By using innovative protein language models to identify deleterious missense variants, the research aims to improve diagnostic rates and contribute to the understanding of a range of disorders. The proposed methodology is widely applicable to various diseases, including neuromuscular disorders and cancer, where missense variants play a critical role.