As next-generation sequencing applications increase, the likelihood that libraries will deviate from a perfectly balanced base composition is becoming unavoidable. Amplicon libraries, libraries generated by restriction digest or reduced representation libraries such as those for bisulphite sequencing all introduce low diversity sequence, particularly in the first few bases of the library fragment. In addition, the use of multiplexing where the barcode is placed at the junction between the adapter and DNA library (inline barcodes) also introduces low sequence diversity at the start of the resulting library.
Image based sequencers such as the Illumina and SOLiD struggle with sequencing libraries with uneven base composition, particularly in the initial cycles. The Illumina machines use the first few cycles to determine cluster coordinates, which sets the stage for the rest of the sequencing run. A significant amount of data loss can occur due to inaccurate cluster mapping. How do we overcome this problem? At Cofactor Genomics, our solution lies in devising innovative ways to push our sequencing platforms beyond vendor specifications to provide our customers with the ability to address novel biological questions that require this type of sequencing.
Figure 1: A low diversity library sequenced on the Illumina platform shows low read quality,particularly in early cycles.x-axis: Position in read (bp), y-axis: Q-score. Graph generated using FastQC.
Non-image based platforms such as the Ion Torrent PGM and the Ion Proton Sequencer provide some relief from imaging issues inherent to low diversity libraries. However, the cost per base and data output for these machines is yet to catch up with the demands of many sequencing applications. Therefore, strategies to improve sequencing of low diversity libraries on other instruments, particularly on the Illumina have been an area of focus for us. Over time we have implemented a number of these strategies to successfully sequence low diversity libraries. Accurate cluster calling for low diversity libraries is more likely if cluster numbers are low.
Figure 2: The same low diversity library represented in Figure 1. was sequenced using a custom primer, leading to a drastic improvement in quality scores. x-axis: Position in read (bp), y-axis: Q-score. Graph generated using FastQC.
Therefore, libraries must be clustered at lower concentrations relative to balanced libraries. For multiplexing with inline barcodes, the solution lies in improving the design of the experiment such that several barcodes are pooled to achieve a more balanced base composition in the lane. However, for other applications where such pooling is not possible, samples can benefit from the addition of a balanced library (e.g. Illumina’s PhiX control or another balanced genomic DNA library) to the same lane as the low diversity sample. A low concentration of the balanced sample helps achieve sufficient cluster numbers that allow for more accurate mapping. This solution works best for samples that have low diversity over the entire length of DNA fragment to be sequenced. Lastly, if the low sequence diversity in the library is limited to the first few bases, it can be overcome by use of a custom sequencing primer such that the first few bases are skipped and sequencing begins further into the fragment where the base composition is likely to be more balanced.
Our team’s dedication to a successful project has been the motivation behind testing all of these solutions and devising protocols to successfully sequence these low diversity samples.