Limitations of cell embedding metrics assessed using drifting islands

IO_AdminUncategorized4 days ago8 Views

Data availability

All data in this study are publicly available. Statistics, resources and corresponding studies are listed in Extended Data Table 1.

Code availability

The implementation code for Islander, as well as tutorial notebooks to reproduce the results in this paper, can be accessed from GitHub (https://github.com/Genentech/Islander). The standalone scgraph evaluation toolkit can be installed using pip (https://pypi.org/project/scgraph-eval/). For scIB evaluation pipelines, the implementations by Gayso et al. were obtained from GitHub (https://github.com/yoseflab/scib-metrics).

References

  1. de Sande, B. V. et al. Applications of single-cell RNA sequencing in drug discovery and development. Nat. Rev. Drug Discov. 22, 496–520 (2023).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  2. Zhang, M. J. et al. Polygenic enrichment distinguishes disease associations of individual cells in single-cell RNA-seq data. Nat. Genet. 54, 1572–1580 (2022).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  3. Rood, J. E. et al. Impact of the Human Cell Atlas on medicine. Nat. Med. 28, 2486–2496 (2022).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  4. Rood, J. E. et al. The Human Cell Atlas from a cell census to a unified foundation model. Nature 637, 1065–1071 (2025).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  5. Hao, Y. et al. Integrated analysis of multimodal single-cell data. Cell 184, 3573–3587.e29 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  6. Theodoris, C. V. et al. Transfer learning enables predictions in network biology. Nature 618, 616–624 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  7. Heimberg, G. et al. A cell atlas foundation model for scalable search of similar human cells. Nature 638, 1085–1094 (2025).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  8. Rosen, Y. et al. Universal cell embeddings: a foundation model for cell biology. Preprint at bioRxiv https://doi.org/10.1101/2023.11.28.568918 (2023).

  9. Cui, H. et al. scGPT: toward building a foundation model for single-cell multi-omics using generative AI. Nat. Methods 21, 1470–1480 (2024).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  10. Hao, M. et al. Large-scale foundation model on single-cell transcriptomics. Nat. Methods 21, 1481–1491 (2024).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  11. Luecken, M. D. et al. Benchmarking atlas-level data integration in single-cell genomics. Nat. Methods 19, 41–50 (2022).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  12. Tran, H. T. N. et al. A benchmark of batch-effect correction methods for single-cell RNA sequencing data. Genome Biol. 21, 12 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  13. Wang, H. et al. Scientific discovery in the age of artificial intelligence. Nature 620, 47–60 (2023).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  14. Liu, T., Li, K., Wang, Y., Li, H. & Zhao, H. Evaluating the utilities of foundation models in single-cell data analysis. Preprint at bioRxiv https://doi.org/10.1101/2023.09.08.555192 (2023).

  15. Kedzierska, K. Z., Crawford, L., Amini, A. P. & Lu, A. X. Zero-shot evaluation reveals limitations of single-cell foundation models. Genome Biol. 26, 101 (2025).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  16. Zhang, H., Cisse, M., Dauphin, Y. N. & Lopez-Paz, D. mixup: beyond empirical risk minimization. Preprint at https://arxiv.org/abs/1710.09412 (2018).

  17. Siletti, K. et al. Transcriptomic diversity of cell types across the adult human brain. Science 382, eadd7046 (2023).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  18. Kumar, T. et al. A spatially resolved single-cell genomic atlas of the adult human breast. Nature 620, 181–191 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  19. Wang, S. K. et al. Single-cell multiome of the human retina and deep learning nominate causal variants in complex eye diseases. Cell Genom. 2, 100164 (2022).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  20. Elmentaite, R. et al. Single-cell sequencing of developing human gut reveals transcriptional links to childhood Crohn’s disease. Dev. Cell 55, 771–783.e5 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  21. Knight-Schrijver, V. R. et al. A single-cell comparison of adult and fetal human epicardium defines the age-associated changes in epicardial activity. Nat. Cardiovasc. Res. 1, 1215–1229 (2022).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  22. He, P. et al. A human fetal lung cell atlas uncovers proximal–distal gradients of differentiation and key regulators of epithelial fates. Cell 185, 4841–4860.e25 (2022).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  23. Solé-Boldo, L. et al. Single-cell transcriptomes of the human skin reveal age-related loss of fibroblast priming. Commun. Biol. 3, 188 (2020).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  24. Heumos, L. et al. Best practices for single-cell analysis across modalities. Nat. Rev. Genet. 24, 550–572 (2023).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  25. Korsunsky, I. et al. Fast, sensitive and accurate integration of single-cell data with harmony. Nat. Methods 16, 1289–1296 (2019).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  26. Hie, B., Bryson, B. & Berger, B. Efficient integration of heterogeneous single-cell transcriptomes using scanorama. Nat. Biotechnol. 37, 685–691 (2019).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  27. Polański, K. et al. BBKNN: fast batch alignment of single cell transcriptomes. Bioinformatics 36, 964–965 (2020).

    Article 
    PubMed 

    Google Scholar
     

  28. Haghverdi, L. et al. Batch effects in single-cell rna-sequencing data are corrected by matching mutual nearest neighbors. Nat. Biotechnol. 36, 421–427 (2018).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  29. Lopez, R. et al. Deep generative modeling for single-cell transcriptomics. Nat. Methods 15, 1053–1058 (2018).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  30. Xu, C. et al. Probabilistic harmonization and annotation of single-cell transcriptomics data with deep generative models. Mol. Syst. Biol. 17, e9620 (2021).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  31. Lotfollahi, M., Wolf, F. A. & Theis, F. J. scGen predicts single-cell perturbation responses. Nat. Methods 16, 715–721 (2019).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  32. De Donno, C. et al. Population-level integration of single-cell datasets enables multi-scale analysis across samples. Nat. Methods 20, 1683–1692 (2023).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  33. Khosla, P. et al. Supervised contrastive learning. In Advances in Neural Information Processing Systems 33 (eds Larochelle, H. et al.) 18661–18673 (NeurIPS, 2020).

  34. Hoffer, E. & Ailon, N. Deep metric learning using triplet network. In Similarity-Based Pattern Recognition: SIMBAD 2015 (eds Feragen, A. et al.) 84–92 (Springer, 2015).

  35. Sikkema, L. et al. An integrated cell atlas of the human lung in health and disease. Nat. Med. 29, 1563–1577 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  36. Xu, C. et al. Automatic cell-type harmonization and integration across Human Cell Atlas datasets. Cell 186, 5876–5891.e20 (2023).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  37. Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  38. van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).


    Google Scholar
     

  39. Becht, E. et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol. 37, 38–44 (2019).

    Article 
    CAS 

    Google Scholar
     

  40. Gayoso, A. et al. A Python library for probabilistic analysis of single-cell omics data. Nat. Biotechnol. 40, 163–166 (2022).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  41. Su, Y. et al. Multi-omics resolves a sharp disease-state shift between mild and moderate COVID-19. Cell 183, 1479–1495.e20 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  42. Luecken, M. et al. Benchmarking atlas-level data integration in single-cell genomics—integration task datasets. figshare https://doi.org/10.6084/m9.figshare.12420968 (2022).

Download references

Acknowledgements

We thank R. Lopez, R. Sosic, P. He, M. Bereket, L. Dony, S.-J. Dunn, G. Eraslan, A. Gayoso, G. Heimberg, K. Huang, J. Marioni, D. Pe’er, L. Peng, Y. Roohani, Y. Rosen, A. Whitehead and J. Zhang for invaluable insights, along with all the members from the J.L. and A.R. labs and colleagues at the Human Cell Atlas, Chan Zuckerberg Initiative and Google DeepMind, for constructive and insightful discussions. J.L. was supported by the National Science Foundation through grants OAC-1835598 (CINES), CCF-1918940 (Expeditions) and DMS-2327709 (IHBEM), the Stanford Data Applications Initiative, the Wu Tsai Neurosciences Institute, the Stanford Institute for Human-Centered Artificial Intelligence, the Chan Zuckerberg Initiative, Amazon, Genentech, GSK, Hitachi, SAP and UCB.

Author information

Authors and Affiliations

  1. Genentech Research and Early Development, Genentech, South San Francisco, CA, USA

    Hanchen Wang & Aviv Regev

  2. Department of Computer Science, Stanford University, Palo Alto, CA, USA

    Hanchen Wang & Jure Leskovec

Contributions

H.W. and A.R. conceptualized the study. H.W. performed the experiments. H.W., J.L. and A.R. wrote the paper.

Corresponding authors

Correspondence to
Jure Leskovec or Aviv Regev.

Ethics declarations

Competing interests

H.W. and A.R. are employees of Genentech, a member of the Roche Group. A.R. has equity in Roche. A.R. is a cofounder and equity holder of Celsius Therapeutics and is an equity holder in Immunitas. Until 31 July 2020, A.R. was a scientific advisory board member of Thermo Fisher Scientific, Syros Pharmaceuticals, Neogene Therapeutics and Asimov. A.R. is a named inventor on multiple filed patents related to single-cell and spatial genomics, including for scRNA-seq, spatial transcriptomics, Perturb-Seq, compressed experiments and PerturbView.

Peer review

Peer review information

Nature Biotechnology thanks the anonymous reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Table 1 Statistics of cell atlases

Full size table

Extended Data Table 2 Benchmarking cell embeddings using scIB with default annotations for 144 cell types on the Human Fetal Lung Cell Atlas, the donor split

Full size table

Extended Data Table 3 Benchmarking cell embeddings using the scIB framework with a broad annotation of 14 cell types on the Human Fetal Lung Atlas

Full size table

Extended Data Table 4 Benchmarking cell embeddings, using scGraph

Full size table

Extended Data Table 5 Benchmarking cell embeddings using scIB and scGraph with default annotations for 9 cell subtypes of fibroblasts, applied to the fibroblast subset of the Human Fetal Lung Cell Atlas. All methods are re-trained on this subset

Full size table

Extended Data Fig. 1

Drifting Cell Islands, different runs of Islander on fetal lung atlas (donor).

Extended Data Fig. 2 Design optimization for scGraph using human fetal lung atlas22.

a, b, Distribution of raw (a) and log1p-transformed (b) scRNA-seq counts. c, scGraph scores using log- 1p counts do not effectively flag distortions caused by drifting cell islands. scGraph scores (y axis) for embeddings generated with each method (x axis) using log-1p counts. d,e Effect of trim rate on PCA centroid locations and scGraph scores. d, Normalized mean square error between centroids (MSE, y-axis) at different trimming rates (x-axis), with centroids at 49% trimming as reference. e, Percentage difference (y-axis) between scGraph scores at various trimming rates (x-axis) compared to the score at 49% trimming. While small trim rates lead to larger changes in centroid coordinates, the corresponding changes in scGraph scores are relatively minor. Based on these observations, we selected a trim rate of 5% per side (10% total).

Extended Data Fig. 3 Scoring human fetal lung fibroblast22 embeddings by scIB and scGraph metrics.

a-c, Embeddings of 31,020 human fetal lung fibroblast profiles from 9 fibroblast subtypes across 29 batches, generated by the top scoring methods based on scIB (scANVI and Is- lander) or scGraph (Harmony and Authors’) and colored by developmental stage (a), cell types (b), or batch (c). Each method was trained on this subset and evaluated using both scIB and scGraph (Extended Data Table 5). d-e, Rankings of integration methods. scGraph (d, y axis) and scIB (e, y axis) scores for each of the 9 integration methods (x axis).

Supplementary information

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, H., Leskovec, J. & Regev, A. Limitations of cell embedding metrics assessed using drifting islands.
Nat Biotechnol (2025). https://doi.org/10.1038/s41587-025-02702-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s41587-025-02702-z

Read More

0 Votes: 0 Upvotes, 0 Downvotes (0 Points)

Leave a reply

Recent Comments

No comments to show.

Stay Informed With the Latest & Most Important News

I consent to receive newsletter via email. For further information, please review our Privacy Policy

Advertisement

Loading Next Post...
Follow
Sign In/Sign Up Sidebar Search Trending 0 Cart
Popular Now
Loading

Signing-in 3 seconds...

Signing-up 3 seconds...

Cart
Cart updating

ShopYour cart is currently is empty. You could visit our shop and start shopping.