The sequencing data for the CASTLE panel produced in this study are openly available at NCBI SRA BioProject PRJNA1086849. Sequencing of the clinical samples is under controlled access and is available through dbGaP study phs002529. Individual accession codes of SRA and dbGaP datasets are provided in Supplementary Table 2 and at https://github.com/CASTLE-Panel/castle. The outputs of all tools, evaluations and command line scripts are available at Zenodo at https://doi.org/10.5281/zenodo.10856827 (ref. 74). The hg38 reference genome is available via NCBI (GCF_000001405.26). The 1000 Genomes Vienna SV panel is available at https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1KG_ONT_VIENNA/release/v1.0/delly-unfiltered-hg38/. Accession codes and references for publicly available datasets (COLO829, HCC1395, HG002, CHM1 and CHM13) are available in Supplementary Table 2.
Cosenza, M. R., Rodriguez-Martin, B. & Korbel, J. O. Structural variation in cancer: role, prevalence, and mechanisms. Annu. Rev. Genomics Hum. Genet. 23, 123–152 (2022).
Stephens, P. J. et al. Massive genomic rearrangement acquired in a single catastrophic event during cancer development. Cell 144, 27–40 (2011).
Li, Y. et al. Patterns of somatic structural variation in human cancer genomes. Nature 578, 112–121 (2020).
Carvalho, C. M. B. & Lupski, J. R. Mechanisms underlying structural variant formation in genomic disorders. Nat. Rev. Genet. 17, 224–238 (2016).
Drews, R. M. et al. A pan-cancer compendium of chromosomal instability. Nature 606, 976–983 (2022).
ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium. Pan-cancer analysis of whole genomes. Nature 578, 82–93 (2020).
Chen, X. et al. Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics 32, 1220–1222 (2015).
Cameron, D. L. et al. GRIDSS: sensitive and specific genomic rearrangement detection using positional de Bruijn graph assembly. Genome Res. 27, 2050–2060 (2017).
Wala, J. A. et al. SvABA: genome-wide detection of structural variants and indels by local assembly. Genome Res. 28, 581–591 (2018).
Rausch, T. et al. DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics 28, i333–i339 (2012).
Fan, X., Abbott, T. E., Larson, D. & Chen, K. BreakDancer: identification of genomic structural variation from paired-end read mapping. Curr. Protoc. Bioinformatics 45, 15.6.1–15.6.11 (2014).
Chaisson, M. J. P. et al. Multi-platform discovery of haplotype-resolved structural variation in human genomes. Nat. Commun. 10, 1784 (2019).
Zook, J. M. et al. A robust benchmark for detection of germline large deletions and insertions. Nat. Biotechnol. 38, 1347–1355 (2020).
Wagner, J. et al. Curated variation benchmarks for challenging medically relevant autosomal genes. Nat. Biotechnol. 40, 672–680 (2022).
Zarate, S. et al. Parliament2: accurate structural variant calling at scale. Gigascience 9, giaa145 (2020).
Jiang, T. et al. Long-read-based human genomic structural variation detection with cuteSV. Genome Biol. 21, 189 (2020).
Smolka, M. et al. Detection of mosaic and population-level structural variants with Sniffles2. Nat. Biotechnol. 42, 1571–1580 (2024).
Sedlazeck, F. J. et al. Accurate detection of complex structural variations using single-molecule sequencing. Nat. Methods 15, 461–468 (2018).
Logsdon, G. A., Vollger, M. R. & Eichler, E. E. Long-read human genome sequencing and its applications. Nat. Rev. Genet. 21, 597–614 (2020).
Lin, J.-H., Chen, L.-C., Yu, S.-C. & Huang, Y.-T. LongPhase: an ultra-fast chromosome-scale phasing algorithm for small and large variants. Bioinformatics 38, 1816–1822 (2022).
Mahmoud, M., Doddapaneni, H., Timp, W. & Sedlazeck, F. J. PRINCESS: comprehensive detection of haplotype resolved SNVs, SVs, and methylation. Genome Biol. 22, 268 (2021).
Shafin, K. et al. Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads. Nat. Methods 18, 1322–1332 (2021).
Sakamoto, Y. et al. Long-read sequencing for non-small-cell lung cancer genomes. Genome Res. 30, 1243–1257 (2020).
Sakamoto, Y. et al. Phasing analysis of lung cancer genomes using a long read sequencer. Nat. Commun. 13, 3464 (2022).
Fujimoto, A. et al. Whole-genome sequencing with long reads reveals complex structure and origin of structural variation in human genetic variations and somatic mutations in cancer. Genome Med. 13, 65 (2021).
Rausch, T. et al. Long-read sequencing of diagnosis and post-therapy medulloblastoma reveals complex rearrangement patterns and epigenetic signatures. Cell Genom. 3, 100281 (2023).
Rossi, N. M. et al. Extrachromosomal amplification of human papillomavirus episomes is a mechanism of cervical carcinogenesis. Cancer Res. 83, 1768–1781 (2023).
Zhou, L. et al. Long-read sequencing unveils high-resolution HPV integration and its oncogenic progression in cervical cancer. Nat. Commun. 13, 2563 (2022).
Akagi, K. et al. Intratumoral heterogeneity and clonal evolution induced by HPV integration. Cancer Discov. 13, 910–927 (2023).
Hadi, K. et al. Distinct classes of complex structural variation uncovered across thousands of cancer genome graphs. Cell 183, 197–210 (2020).
Aganezov, S. & Raphael, B. J. Reconstruction of clone- and haplotype-specific cancer genome karyotypes from bulk tumor samples. Genome Res. 30, 1274–1290 (2020).
Shale, C. et al. Unscrambling cancer genomes via integrated analysis of structural variation and copy number. Cell Genom. 2, 100112 (2022).
Choo, Z.-N. et al. Most large structural variants in cancer genomes can be detected without long reads. Nat. Genet. 55, 2139–2148 (2023).
Shiraishi, Y. et al. Precise characterization of somatic complex structural variations from tumor/control paired long-read sequencing data with nanomonsv. Nucleic Acids Res. 51, e74 (2023).
Elrick, H. et al. SAVANA: reliable analysis of somatic structural variants and copy number aberrations in clinical samples using long-read sequencing. Preprint at bioRxiv https://doi.org/10.1101/2024.07.25.604944 (2024).
Park, J. et al. DeepSomatic: accurate somatic small variant discovery for multiple sequencing technologies. Preprint at bioRxiv https://doi.org/10.1101/2024.08.16.608331 (2024).
O’Neill, K. et al. Long-read sequencing of an advanced cancer cohort resolves rearrangements, unravels haplotypes, and reveals methylation landscapes. Cell Genom. 4, 100674 (2024).
Bignell, G. R. et al. Architectures of somatic genomic rearrangement in human cancer amplicons at sequence-level resolution. Genome Res. 17, 1296–1303 (2007).
Lee, Y. & Lee, H. Integrative reconstruction of cancer genome karyotypes using InfoGenomeR. Nat. Commun. 12, 2467 (2021).
English, A. C., Menon, V. K., Gibbs, R. A., Metcalf, G. A. & Sedlazeck, F. J. Truvari: refined structural variant comparison preserves allelic diversity. Genome Biol. 23, 271 (2022).
Jeffares, D. C. et al. Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast. Nat. Commun. 8, 14061 (2017).
Kirsche, M. et al. Jasmine and Iris: population-scale structural variant comparison and analysis. Nat. Methods 20, 408–417 (2023).
Denti, L., Khorsand, P., Bonizzoni, P., Hormozdiari, F. & Chikhi, R. SVDSS: structural variation discovery in hard-to-call genomic regions using sample-specific strings from accurate long reads. Nat. Methods 20, 550–558 (2022).
Wang, S. et al. De novo and somatic structural variant discovery with SVision-pro. Nat. Biotechnol. 43, 181–185 (2024).
Chen, Y. et al. Deciphering the exact breakpoints of structural variations using long sequencing reads with DeBreak. Nat. Commun. 14, 283 (2023).
Kolmogorov, M. et al. Scalable Nanopore sequencing of human genomes provides a comprehensive view of haplotype-resolved variation and methylation. Nat. Methods 20, 1483–1492 (2023).
Liao, W.-W. et al. A draft human pangenome reference. Nature 617, 312–324 (2023).
Li, H. et al. A synthetic-diploid benchmark for accurate variant-calling evaluation. Nat. Methods 15, 595–597 (2018).
Nurk, S. et al. The complete sequence of a human genome. Science 376, 44–53 (2022).
Steinberg, K. M. et al. Single haplotype assembly of the human genome from a hydatidiform mole. Genome Res. 24, 2066–2076 (2014).
Espejo Valle-Inclan, J. et al. A multi-platform reference for somatic structural variation detection. Cell Genom. 2, 100139 (2022).
Velazquez-Villarreal, E. I. et al. Single-cell sequencing of genomic DNA resolves sub-clonal heterogeneity in a melanoma cell line. Commun. Biol. 3, 318 (2020).
Paulin, L. F. et al. The benefit of a complete reference genome for cancer structural variant analysis. Preprint at medRxiv https://doi.org/10.1101/2024.03.15.24304369 (2024).
Fang, L. T. et al. Establishing community reference samples, data and call sets for benchmarking cancer mutation detection using whole-genome sequencing. Nat. Biotechnol. 39, 1151–1160 (2021).
Talsania, K. et al. Structural variant analysis of a cancer reference cell line sample using multiple sequencing technologies. Genome Biol. 23, 255 (2022).
McDaniel, J. H. et al. Development and extensive sequencing of a broadly-consented Genome in a Bottle matched tumor–normal pair. Preprint at bioRxiv https://doi.org/10.1101/2024.09.18.613544 (2024).
Zhao, Q. et al. Transcriptome-guided characterization of genomic rearrangements in a breast cancer cell line. Proc. Natl Acad. Sci. USA 106, 1886–1891 (2009).
Akdemir, K. C. et al. Disruption of chromatin folding domains by somatic genomic rearrangements in human cancer. Nat. Genet. 52, 294–305 (2020).
Schloissnig, S. et al. Long-read sequencing and structural variant characterization in 1,019 samples from the 1000 Genomes Project. Preprint at bioRxiv https://doi.org/10.1101/2024.04.18.590093 (2024).
Vogelstein, B. et al. Cancer genome landscapes. Science 339, 1546–1558 (2013).
Peterson, J. F. et al. Acute leukemias harboring KMT2A/MLLT10 fusion: a 10-year experience from a single genomics laboratory. Genes Chromosomes Cancer 58, 567–577 (2019).
Lansdon, L. A. et al. Successful classification of clinical pediatric leukemia genetic subtypes via structural variant detection using HiFi long-read sequencing. Preprint at medRxiv https://doi.org/10.1101/2024.11.05.24316078 (2024).
Pollard, J. A. et al. Gemtuzumab ozogamicin improves event-free survival and reduces relapse in pediatric KMT2A-rearranged AML: results from the phase III Children’s Oncology Group Trial AAML0531. J. Clin. Oncol. 39, 3149–3160 (2021).
van Belzen, I. A. E. M. et al. Complex structural variation is prevalent and highly pathogenic in pediatric solid tumors. Cell Genom. 4, 100675 (2024).
Brady, S. W. et al. The genomic landscape of pediatric acute lymphoblastic leukemia. Nat. Genetics 54, 1376–1389 (2022).
Kazantseva, E., Donmez, A., Frolova, M., Pop, M. & Kolmogorov, M. Strainy: phasing and assembly of strain haplotypes from long-read metagenome sequencing. Nat. Methods 21, 2034–2043 (2024).
Cohen, A. S. A. et al. Genomic answers for children: dynamic analyses of >1,000 pediatric rare disease genomes. Genet. Med. 24, 1336–1348 (2022).
Martin, M. et al. WhatsHap: fast and accurate read-based phasing. Preprint at bioRxiv https://doi.org/10.1101/085050 (2016).
Alekseyev, M. A. & Pevzner, P. A. Breakpoint graphs and ancestral genome reconstructions. Genome Res. 19, 943–957 (2009).
Malhotra, A. et al. Breakpoint profiling of 64 cancer genomes reveals numerous complex rearrangements spawned by homology-independent mechanisms. Genome Res. 23, 762–776 (2013).
Li, H. New strategies to improve minimap2 alignment accuracy. Bioinformatics 37, 4572–4574 (2021).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Magi, A. et al. GASOLINE: detecting germline and somatic structural variants from long-reads data. Sci. Rep. 13, 20817 (2023).
Keskus, A., Bryant, A. & Kolmogorov, M. Supporting data for the manuscript ‘Severus: accurate detection and characterization of somatic structural variation in tumor genomes using long reads’. Zenodo https://doi.org/10.5281/zenodo.14541057 (2024).
Keskus, A. et al. KolmogorovLab/Severus: a tool for somatic structural variant calling using long reads. GitHub https://github.com/KolmogorovLab/Severus (2024).
Bryant, A. et al. KolmogorovLab/minda. GitHub https://github.com/KolmogorovLab/minda (2024).
The work was supported, in part, by the Intramural Research Program of the National Institutes of Health (NIH). This work used the computational resources of the NIH High-Performance Computing Biowulf cluster (http://hpc.nih.gov). ONT sequencing of the HCC1395 cell line was supported by the National Cancer Institute of the NIH under award number U01CA253405. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. We would like to thank the participants and families who donated their samples for this research. M.S.F. and E.G. would like to thank Braden’s Hope for Childhood Cancer, Elizabeth and Monte McDowell, the Black & Veatch Foundation, Curing Kids Cancer and Big Slick for their generous support. M.S.F., E.G. and L.A.L. would also like to thank Children’s Mercy Oncology Biorepository study personnel, including J. Vun, A. Hatfield and R. Ryan, as well as J. Seymour and K. Sanders in the Children’s Mercy Research Institute Biorepository for their assistance with sample collection and processing and M. Gibson, A. Walter and L. Puckett in the Children’s Mercy Research Institute Genomics Core for their assistance with sequencing. Y.L. is funded by the NCI-UMD Partnership Program. E.K.M. was supported by the State of Maryland. B.P. was supported by the NHGRI under award numbers R01HG010485, U01HG013748, U24HG011853, U24HG010262 and U41HG010972 and NIH award OT2OD033761. K.H.M. was supported by NIH/NHGRI R01HG011274. We thank A. Liss for creating the broadly consented pancreatic cancer cell line HG008-T. We thank J. McDaniel, V. Patel, N. Olson, J. Wagner and J. Zook at NIST and C. Xiao at NCBI for providing guidance and documentation for using the HG008 data and the GIAB Consortium for releasing all data publicly without embargo. The chromosome illustrations in Figs. 5 and 6 and Extended Data Figs. 7 and 8 were created using BioRender (https://BioRender.com/z86y662). We acknowledge the Gurobi team for providing an academic license free of charge.
S.A. is an employee and stockholder of ONT. A.K., P.C., K.S., D.C. and A.C. are employees of Google and own Alphabet stock as part of the standard compensation package. E.G. served on advisory boards for Jazz Pharmaceuticals and Syndax Pharmaceuticals. M.S.F. is part of the speakers bureau for Bayer and PacBio. The other authors declare no competing interests.
Nature Biotechnology thanks Q. Chris Liu, Kai Ye and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Keskus, A.G., Bryant, A., Ahmad, T. et al. Severus detects somatic structural variation and complex rearrangements in cancer genomes using long-read sequencing.
Nat Biotechnol (2025). https://doi.org/10.1038/s41587-025-02618-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41587-025-02618-8