This is a guest blog post by Iain Bethune. Iain is a HPC project manager at EPCC, prime number finder, space enthusiast, coder, runner, whitewater kayaker, father of four. He collaborates with a range of researchers around the UK and Europe in fields as diverse as Computational Chemistry, Mathematics and Biomechanical Engineering. He develops scalable software to allow researchers to harness HPC resources such as HECToR and ARCHER – the UK National HPC Services – and the European PRACE infrastructure. Most of his work has been in the area of Materials Science, and he is a developer on the CP2K project – a powerful program enabling Density Functional Theory and other atomistic modelling techniques to be applied a diverse range of systems using tens of thousands of CPU cores at a time.
Over the last 10 years, the growth in performance of HPC systems has come largely from increasing core counts, which poses a question of application developers and users how to best make use of the parallelism on offer? In the field of biomolecular simulation, for example, strong scaling using more cores for a fixed problem size is typically limited to a few hundred CPU cores (or a single GPU) with system sizes of 10-100 thousands of atoms. Weak scaling increasing the problem size proportionally to the number of cores produces good performance numbers for systems with up to a billion atoms, but most scientifically relevant simulations are much smaller.
To understand the behaviour and function of a biomolecular system requires achieving a good sampling of the conformational space of the system essentially running a long enough simulation that the molecule has time to randomly explore all the possible shapes it is capable of taking. This sounds simple, but in practice poses a real problem a Molecular Dynamics simulation visits states with a probability exponentially proportional to the free energy of that state. Biological systems usually have a small number of low energy states (for example different foldings of a protein) separated by high energy barriers. Crossing a barrier has such a low probability it might take a few milliseconds or longer for a molecule to spontaneously leave the current state and transition to a new one. The problem is that even with custom high-performance hardware and software, it can take months of computing to simulate even a single millisecond of dynamics! As a result various methods have been developed, such as Metadynamics and Accelerated Dynamics, that aim to push simulations over energy barriers and speed up the exploration of the conformational space. Another approach is based instead of running a single MD simulation, run a swarm of 100s or more simulations starting from the same point, increasing the chance that at least one of them will cross a barrier.
These swarm or ensemble simulations are well suited to making use of parallel computing resources since each simulation is independent, or perhaps loosely coupled via some infrequent analysis step. The ExTASY project has developed a Python framework Ensemble Toolkit which provides a powerful but user-friendly API for coding workflows involving ensemble MD simulations coupled to tools which analyse the results of the simulations and propose new start-points for further MD in an iterative Simulation-Analysis Loop pattern. Building on top of the established distributed computing middleware RADICAL-Pilot, our toolkit allows not only for workflows to be specified programmatically, but they may then be executed directly on a range of compute resources, including locally or on remote HPC or cloud computing platforms. The differences between execution platforms are abstracted away from the user, and the middleware handles details such as resource allocation (perhaps via a batch system), bootstrapping the Pilot-Job which is used to execute the workflow, data staging to and from the target resource, as well as execution of the workflow, respecting dependencies that are defined between the tasks.
So far, we have worked with researchers from Rice University and the University of Nottingham to implement two different sampling workflows using Ensemble Toolkit Diffusion-Map- directed-MD, which uses the GROMACS program for the simulations and LSDMap to analyse the results, and CoCo-MD, based on the AMBER MD program and an analysis tool called CoCo. Thanks to the platform independent middleware, we have demonstrated these workflows running on several large HPC systems in the UK and US ARCHER, a Cray XC30, Stampede, a Linux cluster and Blue Waters, a Cray XE6/XK7 using over 3000 cores.
The ExTASY project is just one example of how science is adapting to the trends in computing hardware, finding more parallelism to use more cores, coupled with the ability to run on whatever platform is available.