While a lot of useful information can be gleaned from analyzing DNA, it can represent a somewhat static view of the organism. Outside of somatic mutations, DNA sequences change very little throughout the organism’s life cycle. Life is very dynamic by nature - being able to explore the constant change happening as an organism responds to its environment is a crucial part of the whole picture. Most of this change can be viewed within the transcriptome, the set of RNA molecules present in a cell or group of cells. Whereas whole exome sequencing can give us a good idea of gene expression, transcriptomics corresponds most closely as it captures the precursor to protein synthesis.

Transcriptomics is not a young field, however, advances in sequencing technology have enabled much more sophisticated experiments. Originally, most gene expression studies relied on hybridization-based microarrays. These, however, were limited due to cross-hybridization artifacts and having to know the sequence before-hand. Microarrays were succeeded in large part by Sanger sequencing of expressed sequence tagged libraries. Preparing these libraries, however, and the Sanger sequencing itself is very time-consuming. With the advent of next-generation sequencing, studies investigating RNA can have a considerable scope and be completed within a reasonable time frame. [1]

The typical workflow for an RNA-Seq experiment involves RNA extraction, isolation, conversion to cDNA, library preparation, and finally sequencing[2]. All steps in the workflow are fairly similar across protocols, except for the library preparation step. Researchers can isolate different RNA populations depending on what they’re looking for. Oftentimes, researchers will deplete ribosomal RNA, which accounts for most of the RNA present in the transcriptome. Once rRNA has been removed, the messenger RNA (mRNA) can better be amplified and sequenced. Another approach that has similar effects is to select mRNA using their polyadenylated tails, and then amplifying them. Finally, people can select specific RNAs using gene-specific oligonucleotide primers[3]. Using all these techniques, researchers can investigate differences in gene expression between test groups, differences in one group over time or in response to a stimulus, and much more.

After selecting for the RNAs a researcher is interested in, conversion to cDNA is required to sequence the samples. Not only is DNA much more stable than RNA, but almost all sequencing platforms require it. Unfortunately, strandedness is lost during cDNA synthesis. There are certain strategies for preventing this that involve extra steps[4]. Quality control is usually done using a standardized set of plasmid inserts, or spike-ins[5], that allow researchers to assess the quality of their samples after sequencing. Following cDNA synthesis, the researcher must choose a sequencing platform that suits her needs. Broadly speaking, the sequencing will be carried out using long read sequencing or short read sequencing. Long reads can be produced on platforms such as those offered by PacBio or Oxford Nanopore whereas shorter reads are produced on Illumina’s sequencers. It’s up to the project’s needs, such as de novo assembly, to determine which sequencer to go with.

Since RNA-Seq has become a mainstream approach to transcriptomics, countless groundbreaking studies have been undertaken. These range from understanding splice junctions[6] and differential gene expression[7] to investigating the bizarre world of tumor microenvironments[8]. However, there is a limitation to the technology. Quantifying gene expression within non-human genomes can be challenging[9], for example. Detecting alternative splicing patterns, as well as the aforementioned problem of strand specificity represent other challenges[10]. Finally, accurately identifying novel transcripts[11] using RNA seq is a unique problem. Fortunately, the field is ever evolving, with approaches such as Iso-Seq and Single-Cell RNA-Seq aimed squarely at solving these problems.


[1] Diagram by Thomas Shafee

[2] Kukurba, Kimberly R., and Stephen B. Montgomery. "RNA sequencing and analysis." Cold Spring Harbor protocols 2015.11 (2015): pdb-top084970.

[3] Frohman, Michael A., Michael K. Dush, and Gail R. Martin. "Rapid production of full-length cDNAs from rare transcripts: amplification using a single gene-specific oligonucleotide primer." Proceedings of the National Academy of Sciences 85.23 (1988): 8998-9002.

[4] Parkhomchuk D, Borodina T, Amstislavskiy V, Banaru M, Hallen L, Kro-bitsch S, Lehrach H, Soldatov A. Transcriptome analysis by strand-specific sequencing of complementary DNA. Nucleic Acids Res. 2009;37:e123.

[5] Synthetic spike-in standards for RNA-seq experiments.
Jiang L, Schlesinger F, Davis CA, Zhang Y, Li R, Salit M, Gingeras TR, Oliver B
Genome Res. 2011 Sep; 21(9):1543-51.

[6] Trapnell, Cole, Lior Pachter, and Steven L. Salzberg. "TopHat: discovering splice junctions with RNA-Seq." Bioinformatics 25.9 (2009): 1105-1111.

[7] Trapnell, Cole, et al. "Differential analysis of gene regulation at transcript resolution with RNA-seq." Nature biotechnology 31.1 (2013): 46.

[8] Patel, Anoop P., et al. "Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma." Science (2014): 1254257.

[9] Hirsch, Cory D., Nathan M. Springer, and Candice N. Hirsch. "Genomic limitations to RNA sequencing expression profiling." The Plant Journal 84.3 (2015): 491-503.

[10] Ozsolak, Fatih, and Patrice M. Milos. "RNA sequencing: advances, challenges and opportunities." Nature reviews genetics 12.2 (2011): 87.

[11] Weirick, Tyler, et al. "The identification and characterization of novel transcripts from RNA-seq data." Briefings in bioinformatics 17.4 (2015): 678-685.

← Previous Next →