- Research Highlight
- Published:
Genomics
volume 43, page 501 (2025)Cite this article
Human ancestry has a considerable impact on gene expression, but genomic datasets for disease analysis severely underrepresent non-European populations, thereby limiting the advancement of precision medicine. In a paper in Nature Communications, Smith et al. introduce a machine learning tool to mitigate the effects of ancestral bias in transcriptomic data.
The tool, called PhyloFrame, creates ancestry-aware signatures of disease by integrating population genomics data with smaller, disease-relevant training datasets. PhyloFrame uses a logistic regression model with LASSO penalty to obtain an initial set of disease-relevant genes. It then uses population genomics data to help compensate for data distribution shifts caused by human ancestry differences. In short, PhyloFrame projects the initial disease signature onto a functional interaction network, extending the network to include the first and second neighbors of each signature gene. This new set is then filtered by a statistic defined as enhanced allele frequency (EAF) — which captures population-specific allelic enrichment in healthy tissue — to identify ancestrally diverse genes that interact with the original signature. From each ancestry, a selected subset of genes with high EAF and gene expression variability in the training data are added to the PhyloFrame signature. Retraining the model with the forced inclusion of these equitable genes results in a signature of disease that generalizes to all populations, even if not represented in the training data.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
24,99 € / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
195,33 € per year
only 16,28 € per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Rights and permissions
About this article
Cite this article
Marchal, I. AI tool adjusts for ancestral bias in genetic data.
Nat Biotechnol 43, 501 (2025). https://doi.org/10.1038/s41587-025-02651-7
-
Published:
-
Issue Date:
-
DOI: https://doi.org/10.1038/s41587-025-02651-7