Genomics is the study of all of a person’s genes (the genome), including interactions of those genes with each other and with the person’s environment. Since the late 90’s, academic and industry analysts have identified the potential of genomics to realise significant gains in development time and reduced investment, largely attached to realising efficiency gains.
Overview of the HPC Market for Genome Sequencing
This market briefing provides an overview of the high-level market for genomics, its drivers and inhibitors. It discusses genomics in the context of the wider high performance computing (HPC) market and the high performance data analytics and cloud computing market specifically. It ends with a discussion of benefits to genome sequencing that CloudLightning anticipates demonstrating using simulations. It draws on publicly available desk research from industry analysis.
Genomics provides pharmaceutical companies with long term upside and competitive advantage through savings right along the R&D value chain and greater accuracy for decision making. Genomics can result in more successful drug discovery. Relatively small efficiency and accuracy gains in the pharmaceutical industry can result in disproportionately large benefits in terms of European employment and GDP. However, genome processing requires substantial computational power and storage, requiring significant infrastructure and specialist IT expertise.
Genomics has two core activities:
- Sequencing: a laboratory-based process involving “reading” DNA from the cells of an organism and digitising the results.
- Computation: the processing, sequence alignment, compression and analysis of the digitised sequence.
The biggest consumer of compute runtime is sequence alignment.
Proposed Solution and Impact
Sequence alignment can be split into many processing jobs, each processing batches of reads and aligning against a reference genome, and run in parallel. Significant input data is required but there is little or no inter-node communication needed. The most computationally intensive kernel in the overall process is local sequence alignment, using algorithms such as Smith Waterman, which is very well suited to being optimised through the use of heterogeneous compute technologies such as Maxeler’s Dataflow Engines.
CloudLightning’s WP7 will demonstrate the use of a large-scale simulation for genome processing with the anticipation of greater energy efficiency resulting in lower costs. As the cost of the raw sequencing technology drops, the computing challenge becomes the final significant technology bottleneck preventing the routine use of genomics data in clinical settings.
CloudLightning will target this both through the use of heterogeneous computing technologies to offer significantly improved performance/cost and performance/Watt, but also enabling this computation to be hosted at large-scale in the cloud, making it practical for wide-scale use. In addition to realigning the computation cost factors in genome processing with sequencing costs, it can significantly improve the genome processing throughput and speed of genome sequence computation. This will have the effect of reducing the wider cycle time thus increasing the volume and quality of related research.