Machine learning in the analysis of biomolecular Advances in physics: X simulations
Kaptan, Shreyas; Vattulainen, Ilpo
Abstract
Machine learning has rapidly become a key method for the analysis and organization of large-scale data in all scientific disciplines. In life sciences, the use of machine learning techniques is a particularly appealing idea since the enormous capacity of computational infrastructures generates terabytes of data through millisecond simulations of atomistic and molecular-scale biomolecular systems. Due to this explosion of data, the automation, reproducibility, and objectivity provided by machine learning methods are highly desirable features in the analysis of complex systems. In this review, we focus on the use of machine learning in biomolecular simulations. We discuss the main categories of machine learning tasks, such as dimensionality reduction, clustering, regression, and classification used in the analysis of simulation data. We then introduce the most popular classes of techniques involved in these tasks for the purpose of enhanced sampling, coordinate discovery, and structure prediction. Whenever possible, we explain the scope and limitations of machine learning approaches, and we discuss examples of applications of these techniques.