We frequently talk to researchers about the small changes that can have a big impact on the results of an RNA-seq experiment. Over the coming week, I’m going to focus on six different things to consider when you start your RNA-seq experiment. Today I’m going to start with where we’d start in an actual experiment, the samples. I love molecular slang, by the way, and if you find any of it confusing, please refer to the bottom of the post for “Cofactor speak”.
1. Sample Quality and stringent QC measures
First and foremost, if a sample is of low quality or complexity, both of which can occur due to degradation, then it is best to either not sequence or at least understand the implications across all aspects of the experiment prior to ever constructing libraries, sequencing, or analysis. I plan to discuss these implications or tools to deal with the implications over the next week…. thanks for hanging with me!
Low quality or low complexity samples will increase noise in RNA-seq data and this must be taken into account when filtering for statistically significant, differentially expressed candidates. Noise will come from amplification procedures, bias during library construction, or other molecular based events. Essentially, in low quality or low complexity samples that are degraded, the fragmentation has already been performed by enzymes in the sample (or ‘ases). Under normal circumstances, we control the level of fragmentation and sometimes strict standards dictate we perform some kind of size restriction based on a gel or dual-spri cleanup (or Sage Pippin Prep or PE Labchip XT….). That is all great, however when your sample is already fragmented by nature, which is not as random as we would like to think, funky things can occur during downstream molecular manipulations. All of this ends up as noise in the final data.
This may seem like a really elementary point, but if you are ONLY using a Nanodrop to define your concentration, it would be best to employ a second, confirmation measurement. The Nanodrop was AWESOME when it came out because we did not have to use a ml of sample. But now, we have additional orthogonal measurement tools, such as the Bioanalyzer (or Experion) and the newly beloved Qubit. Fragment range from the bioanalyzer – check, 260/280 from the Nanodrop – check, concentration from the Qubit – check. This triangulation is how we determine whether a sample will pass or fail library construction at Cofactor, and it is quite stringent. Is this bad.. well, it certainly sets the bar high, but our library construction success is > 95% year over year… not too shabby (and our clients are happy that we can catch samples that might fail before they ever reach the sequencer!).
The goal during library construction is to generate a PCR product that sizes correctly and in general is a pretty good representation of what you observed in the original sample (ghosting is bad!!!). By the way, this does not include samples that undergo 25+ cycles of PCR (we can generated positives with a no-DNA control!!!), are lacking appropriate controls (no DNA, no primer, no polymerase) or treated in any cavalier manner. This is not the place (library construction) to cut corners or forget QC/QA measures! Any screwup that you don’t catch here can cause catastrophic outcomes later!
OK, Deep breath…… Ahhhhhh,
All of the above holds true for DNA and RNA, however you have an added layer of complexity with RNA arising from polyA enrichment or ribosomal depletion strategies (more on that in the next post).
The other thing to consider, as an aside to the stuff above, is over amplification of a sample. Your current sequencing provider should be able to define their duplication percent range for each sample type (FFPE DNA, NuGen low input DNA, etc). Go ahead and throw them a curveball and ask them what their deduplication rate is for RNA-seq data… and then kickback. Up until the recent Bioo Scientific kit (1), it was not very easy to assess duplication rates in RNA-seq data unless you baked your own kit with barcoded adapters. If you are concerned about duplication in RNA-seq data, we can use this kit on your samples to give additional insight concerning duplication in RNA-seq data. And not to point you in the wrong direction, it is the convergence of duplication percent and deduplication that matters, not either one alone (why perform deduplication on a sample with 0% duplication percent?). Interesting reading can be found here.
Tomorrow I’ll be talking about polyA enrichment and ribosomal removal…. tune in to hear me meander ecstatic about all things RNA-seq!
Ghosting – bands or smears that show up on gels, post library production, that originate from non-native genomic material such as adapters, primers, etc…
‘ases – slang for polymerases…. oh, come on, you know this!
“baked your own kit” – develop your own kit from scratch. It may seem easy at first blush, but it is not!
Deduplication – the removal of sequencing reads generated from the same PCR amplification fragment based on alignments to a reference. Deduplication algorithms often require deduplicated reads to have the same start and end alignment bases for paired-end reads.