Mathematical Theory, Computational Challenges, and Opportunities

Structural biology studies the structure and dynamics of macromolecules to broaden our knowledge about the mechanisms of life and impact the drug-discovery process. Owing to recent groundbreaking developments, chiefly in hardware technologies and data processing techniques, many new molecular structures have been elucidated to near-atomic resolutions using cryo-EM. The main goal of this article is to introduce the challenging and exciting computational tasks involved in reconstructing 3D molecular structures by cryo-EM. Determining molecular structures requires a wide range of computational tools in a variety of fields, including signal processing, estimation and detection theory, high-dimensional statistics, convex and nonconvex optimization, spectral algorithms, dimensionality reduction, and machine learning.

IEEE Signal Processing Magazine, 37(2), 2020.

Categories: ,

Abstract

In recent years, an abundance of new molecular structures have been elucidated using cryo-electron microscopy (cryo-EM), largely due to advances in hardware technology and data processing techniques. Owing to these exciting new developments, cryo-EM was selected by Nature Methods as the “Method of the Year 2015,” and the Nobel Prize in Chemistry 2017 was awarded to three pioneers in the cryo-EM field: Jacques Dubochet, Joachim Frank, and Richard Henderson “for developing cryoelectron microscopy for the high-resolution structure determination of biomolecules in solution” [93]. The main goal of this article is to introduce the challenging and exciting computational tasks involved in reconstructing 3D molecular structures by cryo-EM. Determining molecular structures requires a wide range of computational tools in a variety of fields, including signal processing, estimation and detection theory, high-dimensional statistics, convex and nonconvex optimization, spectral algorithms, dimensionality reduction, and machine learning. The tools from these fields must be adapted to work under exceptionally challenging conditions, including extreme noise levels, the presence of missing data, and massive data sets as large as several terabytes. In addition, we present two statistical models, multireference alignment (MRA) and multitarget detection (MTD), that abstract away much of the intricacy of cryo-EM while retaining some of its essential features. Based on these abstractions, we discuss some recent intriguing results in the mathematical theory of cryo-EM and delineate relations with group, invariant, and information theories.