Is it necessary to do base quality score recalibration (BQSR) in the GATK pipeline? How should this be done without an available vcf file of known sites?


(a) Unfortunately the quality scores produced by next generation sequencing instruments are subject to various sources of systematic technical error, leading to over- or under-estimated base quality scores in the data. Some of these errors are due to the physics or the chemistry of how the sequencing reaction works, and some are probably due to inconsistencies in the way different instruments perform. BQSR is a process by which we apply machine learning to model these errors empirically and adjust the quality scores accordingly. This allows us to get more accurate base qualities, which in turn improves the accuracy of our variant calls. We Provide BSQR recalibration step as part of the GATK pipeline on Basepair platform.

(b) BQSR is an optional but highly recommended step in variant calling analysis.   In the event you are working with an organism for which there is no known set of variants available, it is possible to produce a set of known variants for use in this step, although it does require some additional processing. This procedure is known as bootstrapping and entails calling variants without running BQSR, filtering those variants to obtain a high confidence set of variants, and then using these variants as input for the BQSR step.