The performance of NGS alignment tools has continuously improved, but sorting the aligned data now takes more time than alignment. A big leap in sorting was recently initiated with SAMtools version 1.x, with support for parallel processing for sorting.
We compared the sorting speed of a 25Gb unsorted BAM file with SAMtools and sambamba. Our results show that sambamba was 2x faster than SAMtools. The following violin plot shows that SAMtools took 20 minutes while sambamba could sort the same file in 10 minutes. The narrow plot for sambamba indicates that its performance is more predictable than SAMtools.
Despite supporting multiple threads, SAMtools is not very good at parallelization. For the first half, SAMtools was using just a single thread with an occasional spike in CPU usage. The CPU usage was little over 20% for the second half. Sambamba used 30-40% CPU for the first half, and then over 90% for the second half.
The tests were run on AWS instance c3.8xlarge (32 cores, 60 Gb RAM) and the files were stored in local storage. The unsorted BAM file was generated by STAR.
sambamba sort -t 30 -m 45G -o Input.hg19.sambamba-sort.bam Input.hg19.Aligned.out.bam samtools sort -@ 30 -m 1500M -T __sam_tmp__ -o Input.hg19.samtools-sort.bam Input.hg19.Aligned.out.bam
See for Yourself
If you’ve run into situations similar the the ones described above, then we invite you to give Basepair a try for free. If you do try running some analysis then please let us know how it goes via email@example.com. We’d love to hear your feedback!