CoreGenomics: January 2015

Monday 26 January 2015

Low-input RNA-seq: ever wondered how much of your total RNA is of any real interest

The amount of RNA used in an RNA-seq library prep is often listed as a competitive advantage by kit manufacturers. As late as 2004 I used up to 30ug of RNA in a microarray prep, and even a couple of years ago 100ng was considered "low". Nowadays kits are available for picogram quantities, but have you ever considered how much of the total RNA you measure is actually going to be informative?

The answer is not a lot! Stopping to think about this is important as the amount of something in a sample directly corresponds to how easily we can measure it. Wendell Jones (Global Head of Bioinformatics at Expression Analysis), gave a great talk at the recent RNA-seq Europe meeting, where he discussed the relative abundance of different RNAs and the ease (or not) of measuring these on different gene expression platforms. He kindly gave me a copy of his slide deck and I've used this as the basis of my figures below.

RNA QT/QC: We often measure RNA quantity with Ribogreen and quality using the Bioanalyser. When you take a look at a atypical Bioanalyser trace you'll see two major peaks from the 18S and 28S ribosomal RNAs, the ratio of which is used to calculate the RIN. It should be obvious to all that what we see on the Bioanalyser are two stonking great peaks from just two rRNAs, and these, usually unimportant from our perspective, account for a large portion of the total RNA in our Eppendorfs.

What is total RNA: The RNA we get after a total RNA extraction is a complex mix of millions of transcripts. However it is also a mix that is dominated by a very few species: tRNA, rRNA and some very highly expressed transcripts (e.g. Globin or Rubisco). The RNAs we are usually interested in are expressed at very low levels compared to these, and at first glance at the figure below you might just wonder how we measure any of them at all! This is because the most abundant RNAs, namely tRNA and rRNA are uninteresting to most scientists and we usually enrich for mRNA/ncRNA which are in the bottom 5% (by abundance).

If you look at a typical Bland-Altman plot (below) you'll see how the spread of gene expression data (e.g. two replicates) increases at lower RNA expression levels due to measurement noise. This was simplified in Wendel's presentation so we can more easily see there is a point at which we move from quantitation of transcripts to detection. The line is of course artificial and where you consider this should be drawn will depend on many things.

RNA-seq Bland-Altman plot

We can take advantage of this and get flexibility in the dynamic range of our experiments by sequencing to different depths (usual), and/or increasing replicates (preferable).

How does RNA-seq compare to other DGE methods: Wendel presented a great slide where he compared where the detection/quantitation boundary lies for multiple differential gene expression technologies. Most of us are rarely going to go past 20 or 50M reads for differential gene expression, so qPCR is still looking like a tool we'll be using for many years to come. How it competes against some of the newer targeted RNA-seq assays will be interesting to see, and the impact molecular barcodes will have on true transcript abundance measurements is going to give newer RNA-seq methods an edge over current ones.

How low can you go: It is important to remember that while there are methods that can work with incredibly low amounts of RNA, including single-cell RNA-seq, the lower your inputs go, the less chance you have of sequencing the RNAs you might be interested in; especially if they are low-abundance transcripts. Sampling error is something you really need to understand before dropping inputs down way low. In a microarray experiment we clearly showed that reduced RNA input had a clear impact on detection sensitivity, but there was no impact on specificity. Even at low inputs when we saw differential gene expression, the results were accurate -see Lynch et al - The cost of reducing starting RNA quantity for Illumina BeadArrays. The same is (hopefully) going to be true for RNA-seq. For single-cells it will be interesting to see what the community decides is the right read-typ and read-depth to use - I'd be surprised if we go above 10M reads, and we might prefer to use 384 cells with just 1M reads each.

Acknowledgements: Thanks very much to Wendell for sharing these slides.

Friday 23 January 2015

AGBT here we come...

Thursday 22 January 2015

BGI: Illumina's biggest customer turns competitor

"We live in an interesting age" or so the quote goes. It is very interesting that BGI are going to start selling sequencing technology. The coverage of Jun Wang's (BGI CEO) JP Morgan presentation by GenomeWeb lays out what they plan to do to start actively competing with Illumina: exome sequencing, NIPT on BGISEQ-1000 and now new sequencers that will compete directly with HiSeq and MiSeq.

I am not sure, but would hazard a guess that BGI is still one of Illumina's largest customers for instruments and reagents. It must be an interesting relationship to manage from both sides!

Monday 19 January 2015

Illumina's new sequencers - what will we do now?

I was as surprised as everyone else with the most recent announcements by Illumina (JP Morgan post), especially their rapidity given that V4 chemistry is just 6months old; and I quickly sketched out my initial thoughts at the beginning of the week. There's been lots of other coverage, posts (MassG, Omics O, Mick, LabSpaces), and activity on Twitter.

I've now had a chance to digest my thoughts and plan out what I hope we can do here at the CRUK Cambridge Institute - "I'll take two please."

Illumina's announcments from JP Morgan

For those that are interested you can listen to the presentation by Jay Flatley at JP Morgan. And if you can't be bothered, then here are my notes (be warned they may contain inaccuracies):

Illumina's new sequencers - my initial thoughts

Illumina continue to push hard on making DNA sequencing cheaper and easier to access for everyone. The announcements at last night's JP Morgan included a smaller X Ten, HiSeq 3000/4000 and NextSeq 550. I'm not going into much detail about the instruments, you can get the specs on Illumina's website, but I will talk about the impact the next step in Genome Analyser evolution.

Briefly the XTen is joined by the X Five System, half the price and only 9,000 genomes a year. The NextSeq 550 allows labs to also scan Illumina microarrays (remember those), but the focus is clearly cytogenetics and PGD - the system is not capable of running HT12's for instance.

HiSeq 3000/4000: The latest update for HiSeq is another big step forward, and somewhat surprising as it comes only a year after the release of V4 chemistry on the 2500. The patterned flowcells are here. Interestingly there was no mention of upgrades from 2500 to 4000. The 4000 is the machine I'd be wanting in my lab and the specs are pretty exciting: 1.5Tb in three das on PE150bp, 5 billion clusters (the Ga was lucky to get 1M per lane when first released), and still with high quality and ≥ 75% bases above Q30. 12 Human Genomes or 180 Exomes or 250 RNA-seq per run - we'll be busy making libraries!

ctDNA: Illumina also presented their circulating tumour DNA R&D program with kits coming in 2015 something we'll be watching closely.

I'll post again in the next day or two once I've distilled my thoughts on the impact for labs like mine.

Friday 9 January 2015

The not-so-rapid decreasing costs of genetic testing

There is a whole big noisy discussion going on around the regulation of NGS tests, which has been covered in some detail by GenomeWeb and other bloggers. My lab does not develop tests, but I have a real interest in seeing how NGS is being translated to the clinic and reading what's going on is very interesting indeed.

GATC: acknowledging the acknowledgment

Shortly after writing my post on Authorship and acknowledgement of core facility work I was pointed to the GATC Top publications of 2014. Here GATC list the papers where their sequencing (and other) services have been used and acknowledged.

Congratulations GATC: four Nature and one Cancer Cell papers in your 2014 Top 5 can't be bad!

CoreGenomics

Pages