Benchmarking as a tool for quality in genetic testing

Close up photo of test equipment in a lab

Ensuring accuracy in genomic medicine

Modern medical tests based on DNA and RNA sequencing are composed of a pipeline of numerous hardware systems, complex proprietary and open-source software, custom-built scripts, and manual, human-directed computational and decision-making steps.

These pipelines are largely unique to each institution, and pose significant challenges from a quality assurance perspective. In short, how can we be sure that a given genomics test is accurate, and that it provides true information about a patient?

The problem

After generating sequence data, numerous bioinformatics steps are needed to identify the unique genetic markers that contribute to a patient’s condition. Though there are many bioinformatics tools available to do this important task, there is little consensus as to what constitutes an optimal approach. Furthermore, the field lacks a comprehensive understanding of the ground truth, so determining if a tool performs well is not a straight-forward task. To that end, several groups and initiatives have developed resources for gold-standard sets to benchmark against.

Comparing whole genome pipelines

We have undertaken a benchmarking comparison based on several of these gold-standard resources for three different emerging analysis pipelines for whole genome sequencing. These pipelines represent traditional bioinformatics workflows (GATK), FPGA-based tools (DRAGEN), and AI-based pipelines (DeepVariant). Using different gold-standard datasets, we examined various performance features that contribute to the ability to identify changes in single DNA bases and small insertions and deletions.

We find that the solutions tested to some degree all perform very well, but at very different speeds. More importantly, we find that by mixing various elements from the three different approaches, it is possible to speed up the analysis process significantly, while maintaining very high levels of precision and recall. Our study is now being submitted to a scientific journal for publishing.

Standardization in DNA-based diagnostics

The standardization and harmonization of sequencing-based medical diagnostics is still in its infancy. To provide accurate tests, hospitals need to have not only the right know-how, but also need to continually monitor the quality of their test sand benchmark these against common community standards.

Furthermore, the discoveries possible through data-sharing projects such as the EU 1 million genomes initiative, which Norway has recently signed on to, will only be successful if data is generated with traceable methods and fulfills suitable quality levels. Benchmarking efforts such as this work are an indispensable tool for providing the trust that makes such initiatives possible, and are critical for ensuring test validity in hospitals.

An introduction to Accuracy and efficiency of germline variant calling pipelines for human genome data, an article published in Nature

Advances in NGS technology have enabled WGS to be widely used for the identification of causal variants in a spectrum of genetic-related disorders and provided new insight into how genetic variants affect disease phenotypes. The development of different bioinformatics pipelines has continuously improved the variant analysis of WGS data. However, there is a necessity for a systematic performance comparison of these pipelines to provide guidance on the application of WGS-based scientific and clinical genomics. In this study, we evaluated the performance of three variant calling pipelines (GATK, DRAGEN and DeepVariant) using the Genome in a Bottle Consortium, “synthetic-diploid” and simulated WGS datasets. Our results aid the choice of bioinformatics pipelines for reliable variant detection, which is critical in medical research and clinical applications.

Access the article via the link provided below!