Broad-Intel Genomics Stack*-BIGstack*

Intel and the Broad Institute of MIT and Harvard are at the forefront of the effort to accelerate genomics analysis and the benefits it can produce. Together, Intel and Broad have introduced an integrated hardware and software solution to run Broad’s popular Genome Analysis Toolkit* (GATK*) faster, at unprecedented scale, and with easier deployment. It used to take six weeks to generate a database from 2,300 genomes. Now, using the Broad-Intel Genomics Stack* (BIGstack*), a database containing 5x more information can be generated in only two weeks.

BIGstack is a game-changing, end-to-end integrated hardware and software package. With common, validated reference designs that use the latest generation Intel® Xeon® Scalable processors, Intel® Arria® 10 Field Programmable Gate Array (FPGA) PCIe* cards, Intel® Omni-Path Architecture (Intel® OPA), and Intel® 3D NAND Solid State Drives (Intel® SSDs), BIGstack can help ease the complexity of running the genomics analysis pipeline (specifically, Broad Institute’s production-worthy Best Practices workflows) while dramatically speeding up the analysis process.

This paper demonstrates how a BIGstack-based platform that uses the latest Intel® Xeon® Scalable processors and Intel® 3D NAND SSDs achieves a throughput of up to 5 whole genomes and more than 100 whole exomes per day per node. Intel® FPGA technology further speeds up the individual sample analysis by up to 2.2x for whole genomes at a lower memory cost compared to prior-generation Intel® Xeon® processor for Broad’s GATK Best Practices. Information is provided about tools, technologies, optimizations, and methodology, as well as details about latency, throughput, and utilization of CPU, memory, and disk resources.