Introduction to RNA-seq
Genotype and phenotype. Historically, researchers have used these two readouts to determine what is happening – or could happen – in a cell or tissue or organism. Using DNA sequencing to discover the genetic underpinnings of a physiological outcome was a primary goal of the Human Genome Project. Indeed, the past several decades have given us vast amounts of data about the consequences of alterations in DNA sequencing. From chromosomal rearrangements to single nucleotide polymorphisms revealed by the HGP, we are building an extensive library of DNA markers for health and disease.
Of course, we also know now that relatively static genomic sequences are only part of the biological reality. DNA is invaluable for investigating heritable conditions and making predictive assessments of a biological sample, but it says very little about the dynamic, real-time operations of a cell. As a result, we are now turning to RNA sequencing to link gene expression and physiological conditions.
RNA sequencing (RNA-seq) allows researchers to uncover these issues and obtain snapshots of a cell, tissue or organism at specific moments in time. This series will discuss RNA-seq from start to finish (though not in comprehensive detail), including how to set up a good RNA-seq experiment and how to deal with the resulting data. For now, though, we will simply discuss why RNA-seq may be the right choice for your next gene expression experiment.
Why Look at RNA?
Where DNA is the underlying blueprint for all cellular processes, RNA is the molecule produced on demand when those processes are needed. Proteins translated from messenger RNA then carry out the encoded functions. Thus, RNA sits at a unique position between DNA and protein. It can reveal problems in the underlying DNA code, as well as defects in processing machinery that ultimately lead to disregulated gene expression or defective proteins. For example, where a DNA coding region might look normal, a downstream transcriptional problem could lead to alternative splicing of the resulting RNA molecule, leading in turn to a non-functional enzyme and inducing a disease state. These splice variants might not be pathogenic, either. In some cases, sequencing the RNA can reveal sequences that produce different protein isoforms.
Advantages of RNA-seq
In many disease states, it’s not the DNA sequence that matters but the downstream expression of the encoded gene. In cancer, kidney disease, cardiovascular conditions, autoimmune disease, and many more, changes in gene expression are the underlying cause. Historically, researchers have used protein levels as a readout in these instances. It makes sense: protein is the final product of the DNA —> RNA —> Protein pathway, so the functional molecules are theoretically the most biologically relevant. RNA-seq holds a number of advantages over protein microarrays (and their sister assay, RNA microarray). First, RNA-seq has a much lower detection threshold than arrays (learn more about the differences between the two assays in this blog post). The latter involve hybridization of isolated protein to an assay plate that includes a fluorescent marker. More protein causes a brighter signal. That signal has to be high enough for detection, though, necessitating a certain level of starting material. RNA-seq experiments require as little as 100 picograms of starting material, and there are assays for RNA-seq on single cells. (See the Cofactor website for current offerings and sample submission requirements.)
A problem related to the amount of starting material is that fluorescent assays have inherent background. Thus, you need enough protein to bring the signal above that noise. Problems with background can also result in false positives. There is an error rate with RNA-seq, but it is generally on level of individual nucleotides (where a single base might be mis-read by the sequencer). Most of these errors can be dealt with through downstream quality control. Unlike microarrays, then, RNA-seq allows for outstanding dynamic range, with detection of low expressed transcripts alongside with those expressed at high levels. The overall accuracy of RNA-seq is excellent and surpasses array data, as described in 2008 by Mortazavi et al:
“[…] the bottom quartile of the Affymetrix ‘present’ calls showed no correlation with the RNA-Seq data (R2 = 0.03), suggesting that many of the putatively ‘expressed’ RNAs identified by the microarray analysis might be false positives.”
One of the strongest features of RNA-seq is that it is unbiased and can be used to detect both known and unknown targets. This could mean sequencing the transcriptome of a new organism, or looking for novel gene fusions that occur naturally or in a disease state. Arrays – both protein and RNA – require known targets as bait because they must be printed on the assay plate itself. Arrays are therefore useful for detecting the presence, absence or level of a known target, but can’t be used for discovery purposes.
Affordability and Speed
RNA-seq is similar to DNA sequencing but with an added step. Instead of isolating DNA, RNA is extracted from a sample and then reverse transcribed to produce cDNA. From there, the cDNA is fragmented and run through a high-throughput next generation sequencing system. Thus, RNA-seq has become an accessible technology for researchers as sequencing in general has advanced. The cost of sequencing has, of course, come down by several orders of magnitude over the past decade and a half. High-throughput sequencing is an affordable tool, although the final cost will depend on experimental goals and design.
Importantly, RNA-seq is highly reproducible and so does not require technical replicates. Money saved on repeating a run can instead be used for additional experimental arms or biological replicates.
Turnaround times for RNA-seq experiments are also increasing. The Illumina NextSeq platform used at Cofactor can return 400 million sequencing reads in under 24 hours. Including library prep time, sequencing, and bioinformatics analysis, Cofactor can turn around a 24-sample experiment in about three weeks. As noted above, experimental design must be undertaken carefully to ensure the most efficient use of resources and the best possible data, but there are many resources available to help researchers develop their RNA-seq experiments.
Cofactor project scientists are always happy to serve as one of these resources, so get in touch with any questions and to discuss your experimental needs.