Data-driven determination of number of discrete conformations in single-particle cryo-EM

Single-particle cryo-EM can be used to image heterogeneous samples containing multiple molecular species, different oligomeric states or distinct conformations. This, however, requires expert-user knowledge and trial-and-error experimentation to determine the correct number of conformations present in a mixture. Here, we propose an approach to address the problem of automatically determining the number of discrete conformations. We do this by systematically evaluating all possible partitions of the data and selecting the result that maximizes the average variance of similarities measured between particle images and the corresponding 3D reconstructions. We validated our strategy using in-silico mixtures obtained by combining images from closely related membrane proteins and naturally occurring mixtures from dynamic protein complexes.

 

Computer Methods and Programs in Biomedicine, 2022.

Categories: , ,

Abstract

One of the strengths of single-particle cryo-EM compared to other structural determination techniques is its ability to image heterogeneous samples containing multiple molecular species, different oligomeric states or distinct conformations. This is achieved using routines for in-silico 3D classification that are now well established in the field and have successfully been used to characterize the structural heterogeneity of important biomolecules. These techniques, however, rely on expert-user knowledge and trial-and-error experimentation to determine the correct number of conformations, making it a labor intensive, subjective, and difficult to reproduce procedure. We propose an approach to address the problem of automatically determining the number of discrete conformations present in heterogeneous single-particle cryo-EM datasets. We do this by systematically evaluating all possible partitions of the data and selecting the result that maximizes the average variance of similarities measured between particle images and the corresponding 3D reconstructions. Using this strategy, we successfully analyzed datasets of heterogeneous protein complexes, including: 1) in-silico mixtures obtained by combining closely related antibody-bound HIV-1 Env trimers and other important membrane channels, and 2) naturally occurring mixtures from diverse and dynamic protein complexes representing varying degrees of structural heterogeneity and conformational plasticity. The availability of unsupervised strategies for 3D classification combined with existing approaches for fully automatic pre-processing and 3D refinement, represents an important step towards converting single-particle cryo-EM into a high-throughput technique.