
CloudBurst, presented by Schatz et al., 15 is a sensitive parallel seed-and-extend read-mapping algorithm, optimized for mapping single-end (SE) reads. 13 Doruk Bozdag and Umit Catalyurek from the Ohio State University proposed six parallelization methods to improve the hash/index-based short-sequence mapping: partitioning reads only, partitioning genome only, partition reads and genome, suffix-based assignment (SBA), SBA after partitioning reads and SBA after partitioning genome (see Bozdag et al. Many methods are introduced and tools or programs based on these algorithms have been reported on an almost weekly basis to meet these challenges.

Moreover, certain error characteristics with second generation sequencing, for example, Roche 454, have the tendency to have insertion or deletion errors during homopolymer runs, 12 therefore, they need to be considered when designing analysis tools. Some of the previous programs that are performing for the Sanger sequencing reads have not yet adapted to the huge volumes of data produced by NGS. 10 The traditional methods such as the pure Smith-Waterman dynamic programming, BLAT or BLAST may map the reads in a few days (given a large and expensive computer grid), however, such grids are not available to everyone. These call for algorithms that can be used to obtain as much information as possible from the sequencing data. One is the significantly greater amount of data, which requires optimized memory usage and speed, and the other is the different error profiles of data from the previous technologies. We also have to consider two fundamental issues aside from the shorter reads that are produced by NGS (compared with those from gel-capillary technology). 1 Alignment, as a classical problem in bioinformatics, requires finding the most credible source for the sequenced DNA, 11 using the information of which species the reads have been generated.

The most important step in NGS analysis is the mapping of reads to the original sequences. We will only discuss some of the software, which we have first-hand experience on (considering the rapid developments in this field), and compare their working efficiency in terms of sensitivity, accuracy, speed and random-access memory (RAM) requirement. Massive tools for NGS reads mapping and assembly have been flooding the market until now.
Illumina bible software software#
The limitations on short read lengths (typically 35–400 bp compared with 650–800 bp of Sanger-based technology reads), low reading accuracy in homopolar stretches of identical bases, and non-uniform confidence in base calling require more efficient software and algorithms to help these new technologies develop further in the immediate future. However, certain obstacles stemming from the NGS's inherent characteristics need to be eliminated before these technologies can be extensively used. 1 The most fundamental steps for almost all of these applications are the mapping of the reads to the reference genome and the assembly of the reads to attain the desired DNA sequence for analysis. A new applications is also likely to be unveiled in the coming years. NGS technologies have also made a huge and ongoing impact on transcriptome, gene annotation and RNA splice identification in addition to the traditional applications of DNA sequencing in genome resequencing and SNP discovery, Metagenomic 8 and genome methylation analysis 9 have also benefited from these new technologies. 7 These important characteristics permit the ultra-deep sequencing technologies to be widely used in the field of biology and medical research.


These new technologies are advantageous because of their high throughput and low cost per base with over one billion reads per run incurring significantly lower base-cost, 2 which have given great impetus to the achievement of the 1000 Genomes Project goal. Two new systems, the Helicos HeliscopeTM ( and Pacific Biosciences SMRT ( instruments, 6 which avoid the amplification step and use single molecule as template, were also introduced recently. These methods are all based on a template amplification phase before sequencing. Three platforms have been availabile: the Roche/454 FLX (30) ( ), the Illumina/Solexa Genome Analyzer (7) ( ) and the Applied Biosystems SOLiDTM System ( ).
Illumina bible software series#
Thanks to the recent availability of optical instruments and the application of molecular biology, 1 a series of new massively parallel sequencing technologies, the NGS technologies, have tremendously changed this scenario. 3, 4 The limitations of the conventional Sanger (or di-deoxy terminator 5) strategy urgently required certain new technologies for sequencing human genomes in parallel despite these dramatic improvements in this era. ‘Next-generation sequencing’ (NGS) platforms has been introduced and are wildly available recently, 1, 2 although large-scale sequencing laboratories were significant contribute to Human Genome Project.
