The genome is the unique blueprint of every living organism on earth. In almost every other species like humans, exhaustive research has demonstrated it is also in concert with environmental and epigenetic factors. Changes in it, is undeniably one of the pillars of disease pathogenesis - or lack thereof. It comes as no surprise that researchers have been attempting to decode it until the days of Watson and Crick, when we started to finally understand what we were really looking for.
The culmination of these loose efforts was the human genome project1. This publicly funded project began in 1990 and was completed in 2003, having sequenced the euchromatic human genome. To do so, DNA samples were first cut up into more manageable fragments and then cloned in BACs (bacterial artificial chromosomes) creating BAC libraries. These libraries are then further subdivided into fragments which are sequenced. The sequencing techniques leverage our own biological machinery in the form of DNA polymerase. By using fluorescence and cycling the additions of a single reagent, researchers can determine the order that bases are incorporated.
After this “shotgun sequencing”2
, the raw and highly redundant data is exposed to a number of bioinformatics techniques to understand the true sequence - or at least our best guess. This is where coverage comes in. The standard 30x coverage for human whole genome sequencing
refers to the number of times, on average, a single position is read. This redundancy affords more reliable predictions of what the actual sequence is. Of course, one can have much greater sequencing depth for more certainty or use ultra-low coverage (Low Pass Sequencing) with certain algorithms to fill in the gaps.
Genome sequencing does not exist in a vacuum - there are important economic, social, and political forces to consider. The human genome project was an international, decade plus long effort requiring billions of dollars. Today, one can provide a saliva sample and have their entire genetic sequence on their laptop in a few days for close to $1,000.
Somewhat surprisingly, most of the sequencing done today relies on similar trusty techniques (SBS) as the human genome project. These techniques however, are much more efficient and coupled with ever-more complex computational solutions. Beyond the massive increase in data output, the introduction of Next Generation Sequencing technology has transformed the way scientists think about genetic information.
The $1,000 genome enables population-scale sequencing and establishes the foundation for personalized genomic medicine as part of standard medical care. Researchers can now analyze thousands to tens of thousands of samples in a single year.
The undisputed heavyweight in the field is Illumina. Their range of products using the sequencing by synthesis (SBS) approach account for the vast majority of sequencing today. Their highest throughput offerings include the HiSeq X and NovaSeq systems, promising to bring the cost of human whole genome sequencing down even further. However, Illumina offers a “printer-and-cartridge” model, making most of its money not from sequencer sales but from preparation kit and especially reagents. Although Illumina is the biggest player, it is by no means the only company or only approach to genome sequencing. Notable disruptors such as PacBio
and Oxford Nanopore use completely different approaches to sequencing with their respective benefits and drawbacks.
Furthermore, there are myriad others leveraging proprietary preparation techniques and computational approaches for extending the existing and improving sequencing capabilities in novel ways (10x Genomics)
When considering its history and the current state of human whole genome sequencing, we appear to be well on our way to widespread, personalized medicine. With continued research and investment into this exciting and still somewhat nascent field, one could imagine a not-too-distant future where our entire healthcare landscape shifts from reactive to proactive.
2Kaiser, Olaf, et al. "Whole genome shotgun sequencing guided by bioinformatics pipelines—an optimized approach for an established technique." Journal of biotechnology 106.2-3 (2003): 121-133.