The 1000 Genomes Project ONT Sequencing Consortium (1KGP-ONT) is building on the landmark work done by the 1000 Genomes Project (1KGP), which began in 2008 as a collaborative initiative to establish a database of normal human genetic variation by sequencing the genomes of over a thousand healthy individuals from diverse ancestries. In the end, the 1KGP study sequenced over 3,000 genomes using short-read sequencing, and it continues to provide invaluable insights into human genetic diversity.
The 1KGP-ONT kicked off on Thursday, June 30, 2022. Funding-permitting, we hope to perform long-read sequencing of all 1KGP samples, which are available as DNA or cell lines from the NHGRI's Sample Repository for Human Genetic Research housed at the Coriell Institute for Medical Research. To obtain high-coverage, high-quality long-read assemblies, we are isolating high molecular weight DNA directly from cell culture of 1KGP cell lines obtained from Coriell.
The goal of the 1KGP-ONT Consortium is to identify a broader spectrum of genomic variation than is possible using short-read sequencing so we may further improve our understanding of human genetic disease. This dataset is already enabling us to better understand normal patterns of human structural variation, identify variation in difficult-to-map regions of the genome, and study repeat expansions and methylation patterns.
We are a collaborative group of researchers from around the world interested in leveraging long-read sequencing to better understand the normal patterns of structural variation, methylation, and repeat expansion in the population so we can more effectively identify missing disease-causing variation in individuals. The project is led by Danny Miller and Evan Eichler at the University of Washington, and cell culture and DNA extraction are performed in the Miller and Eichler labs. Sequencing is performed at the University of Washington, the New York Genome Center, and Stanford University. Individuals and institutions contributing to this work are listed below.
Long-read sequencing is being performed on the Oxford Nanopore platform. This technology works by measuring changes in current as single-stranded DNA or RNA molecules pass through a protein pore. The first 100 samples were sequenced on the R9.4.1 pore, and subsequent samples will be sequenced using the R10 chemistry.
The 1KGP-ONT Consortium is committed to publicly releasing data as they are generated, after basecalling and standard QC. Raw sequencing data, processed data, and summary data can be found here.
Analysis of the first 100 genomes sequenced for this project is published in Genome Research (PMID 39358015). The initial 100 samples:
The 1KGP-ONT Consortium is open to all. Please contact Danny Miller if you are interested in joining the consortium and to be added to the Slack group.