RNA-seq; lies, half truths, and the skinny Pt. 4 (Transcriptome fingerprint)

Posted July 24, 2013

(This post is a continuation of our RNA-seq series leading up to our RNA-seq DX^TM Webinar on July 26th. Part 1, Part 2, Part 3)

The last two posts in our series focused on establishing experimental success; through understanding adequate gene discovery (saturation characterization) and putting the data in the context of built-in controls (molecular spike-ins). Both of these techniques are driven by first principles. They don’t necessarily help us characterize the samples better, but they lend themselves to better data which in turn impacts our confidence in our discovery. Our final topic brings us back to the actual transcriptome characterization that is at the heart of RNA-seq.

When diving in to an experiment, the standard model assumes that >95% of the loci will have very similar expression profiles across samples while <5% will have different expression profiles and ultimately be responsible for the difference in phenotype observed between samples. There are of course exceptions and confirming adherence to the standard model is rarely the focus of an RNA-seq experiment, so why look here? At Cofactor we’ve observed that when the data drastically diverts from this model (identified both qualitatively and quantitatively) it’s an indication of an issue with the samples/experiment. Sometimes it’s a problem that’s not caught until the researcher has already spent a considerable amount of time and money investigating characterization or something even further downstream.

Cofactor uses what is referred to as a transcriptome fingerprint for the qualitative view and assessment of a transcriptome characterization. Over the last 15 years members of our team have used this technique to assess RNA based experiments. There is a large amount of important information that is presented in the simple plot shown below. This information is not inherently shown in simple stats on the data (as per the classic https://en.wikipedia.org/wiki/Anscombe’s_quartet). The important information represented from this figure is covered in fig 1 below.

In addition to the qualitative assessment of the experiment there is a quantitative measures of adherence to the 95% concurrence model. The fingerprint data is analyzed to determine two things; the R² values it generates and the coverage value where the data drastically change from being distributed to being linear. The first value being a general assessment of the data, while the second value is a very practical tool to be used in setting a noise/signal cutoff in considering candidates. I’ve seen countless researchers move forward with characterization without first understanding the parameters of their data. RNA-seq DX^TM is the collection of techniques necessary to establish a successful RNA-seq study and includes what we’ve laid out in these blog posts. Next-gen sequencing is an extremely powerful tool. But more power often requires more attention to sensitivity, specificity, and precision to realize a tool’s true potential.

I hope this series on RNA-seq DX^TM has struck a chord with you. Our aim is for our fellow scientists to think beyond using data generation as an indication of success, To step back to some fundamental requirements, controls, and hypothesis about the model. Our first post laid out the statement “RNA-seq sucks”. Our opinion is that these tools will help researchers perform RNA-seq with confidence and precision. It’s why we chose to name this approach DX. It’s about differential expression with a diagnostic approach. I for one am excited.

For the full story and more, I hope you attend our webinar on Friday (July 26th) from 12-1 CST.

Stay Up To Date