If we are looking at protein coding transcripts, an important step in in the cDNA library preparation is the removal of ribosomal RNA from our samples. Ribosomal RNA (rRNA) accounts for 97-99% of the total RNA in a sample and it is wasteful and uninformative to sequence this material. Typically the removal is performed by polyA-isolation for eukaryotes - only transcripts with a polyA tail are kept. And for prokaryotes, by ribosomal depletion - oligos matching ribosomal sequences are hybridised to the rRNA and magnetically separated from the non-ribosomal RNA.
There are two potential effects of these procedures which we should look out for in our data as it could confound our later analysis.
3’ bias can occur because the polyA-tail is found at the 3’ end of the RNA. Only fragments attached to the 3’ end are isolated and sequenced. If your RNA is high quality, the whole transcript is attached to the polyA-tail and there should be reasonably even coverage across the transcript. However if your RNA is degraded only the portion of the transcript still attached to the polyA-tail will be isolated and make it through to the library. Thus a significant 3’ bias suggests that our starting RNA was of poor quality.
rRNA abundance after ribosomal depletion should be assessed as part of the sample QC. If rRNA remains in the samples it is likely to occur in different amounts in different samples. To our analysis techniques this will alter the proportion of tRNA between samples and confound our experiment by increasing the apparent variability in gene expression. The presence of significant amounts of rRNA in our samples could indicate a problem with the library preparation or RNA that was degraded preventing oligos from hybridising to the rRNA. In this case we may need to repeat the sequencing experiment.
We can check for the abundance of rRNA in our data using a RNA-SeQC, which is available from the Broad Institute. In the interest of time we won’t run this program in the transcriptomics workshop but it is useful to know about this quality control step, especially if you have trouble detecting mRNA in your own dataset.