Cryo-electron microscopy (CryoEM) coupled with single particle Reconstruction (SPR) is a method to determine the three dimensional structure of macromolecules and complexes in the form of a coulombic (charge) density map (see reference 4, Literature review Docs page). These maps mostly detail the locations of atom’s electrons and are similar in appearance and content to X-ray crystallographic electron density maps. The level of detail ranges from gross structural features (>10Å) to individual atomic positions (<3.5 Å). In high-resolution maps, some portion of the imaged molecule may be traced to give an atomic model as in X-ray crystallography. For lower resolution data, near atomic detail can be inferred by docking existing X-ray crystal structures, if available, into the map. The utility of this information is similar to that obtained from X-ray crystallography, and can be used to determine the composition, stoichiometry and arrangement of multi-subunit complexes, deduce structural mechanisms of biological activity, guide mutagenesis studies, and reveal novel intermolecular interactions.
The basic premise is that a single low-dose EM image is extremely noisy (low signal-to-noise ratio, (SNR)), but by averaging together many structurally similar particle images at the same orientation, the SNR will be increased and reveal a high level of detail (higher resolution). To effectively do this, EM images of thousands to millions of particles sampling all possible orientations need to be collected, aligned with each other, and their relative orientations determined to high precision. Further, the particles must be very similar to each other for averaging to reveal high detail.
CryoEM imaging is accomplished by flash freezing macromolecules in a thin layer of water solution on a holey carbon-coated TEM grid at liquid nitrogen temperatures (vitreous ice). Ideally, the ice is thin enough to just cover the particles that are evenly distributed in each hole without overlapping each other. This sample is placed in the microscope under high vacuum and imaged with high energy (200-300 keV) electrons but at low-dose conditions (~50 e/Å2). This is similar to what is done in a light microscope, except that electromagnetic lenses are used to control the electron beams and the images (movie stacks) are recorded using a direct electron detector (DED). The development of sensitive and fast DEDs was one of the critical achievements that led to the “Resolution Revolution” as previous detection methods (CCD/CMOS/film) have significantly lower detective quantum efficiency (DQE) at high resolution when compared with DED. Higher DQE means that small particles are more easily visualized, pushing the DED-detectable molecule size down to ~100 kDa. The other critical advance was the ability to account for the motion of the sample during data collection by taking “movies” (a series of low-dose frames) that are then aligned and merged to create a single higher resolution image.
CryoEM-SPR bypasses the one major constraint of X-ray crystallography: crystals. This means that, in theory, any protein or nucleic acid, alone or in complex, can be visualized and generate a 3-D reconstruction. In particular, many large membrane proteins, which are often difficult to crystallize, have proven to be good SPR substrates. However, there are still important, sample-dependent limitations that need to be overcome to develop high-resolution maps:
- size and shape: Due to the low SNR and low contrast due to the weak electron scattering power of C, N, O and H atoms, generally, the particles need to be larger than 150 kDa in mass. A less spherical shape will be easier to align and result in a more effective structure solution. Helical filaments and symmetrical arrays such as virus-like particles can also assist in image reconstruction because molecular orientations are more easily assigned. Smaller molecules have been imaged to high resolution (e.g. hemoglobin, 64 kDa) but these are special cases. Consult with us if you have a project like this.
- sample heterogeneity: Image processing relies on class average and orientation determination of particles. In order to obtain reliable 2-D and 3-D class averages, the constitutional and conformational differences between molecules need to be minimized. Therefore, the starting point for a successful project requires size-exclusion purified, chemically and structurally homogeneous samples. While some heterogeneity can be computationally compensated for, the problem is getting enough similar particles. Even with pure samples, atomic structures sometimes use as little as 2% of the total number of particles picked for further analysis, requiring starting with hundreds of thousands to millions of particles. Obviously, starting with impure material will reduce the number of useable particles.
- useable grid conditions: The goal is 100-1000 well-separated individual particles per image. Sample vitrification is a harsh process to embed particles into the thin ice (50-100 nm). Even with pure samples, vitrification can lead to partial or complete denaturation, and/or aggregation. The carbon film on the grid itself can also bind macromolecules and prevent them from entering the holes. These problems can be overcome by extensive screening of buffer conditions, detergents, additives and grid types. Determining these conditions is one of the greatest hurdles to ability to solve a structure using SPR and can take much labor.
Once samples are imaged, computations (image processing) play a critical role in all subsequent phases of the SPR method. Phases 1-4 can initially be performed in the BioEM facility but may need to be moved onto user’s computers in subsequent iterations.
Phase 1) On-the-fly Motion Correction
Samples inside the electron microscope constantly drift and vibrate. These motions will introduce blurred images with longer exposure time. Instead of a single image, movie stacks (100 frames/stack) are recorded by DED and frames are aligned, merged, and dose weighted to generate dose weighted motion free final image by UCSF MotionCor2. This computation can be performed during data collection (“on-the-fly”) or later (Phase 2).
Phase 2) Off-the-pipeline Motion Correction
Another challenge is that datasets are very large (2-4 TB raw data set, 100-200 GB for merged data sets) which means that moving and processing that much data can be very slow. Although the BioEM facility is equipped with high speed and high-performance data transfer and image processing infrastructure, we usually do not repeat motion correction and CTF determination steps, which may significantly slow down data collection and the downstream image processing pipeline. Users will take over any processed or unprocessed data and continue to process the data on user’s workstations that is off the “on-the-fly” pipeline.
Phase 3) Particle Picking
After frames are motion-corrected, then particles need to be picked for 2-D and 3-D class averaging. Doing this entirely by hand is impossible and so semi- or fully-automated methods are applied. This task can be automated during data collection (“on-the-fly”), with variable success for a preliminary evaluation. Typically, “particle picking” is repeated several times to improve results.
Phase 4) 2-D Classification
Creation of 2-D class averages is how a data set can be evaluated. This process can also be automated together with particle picking. Particle images are essentially 2-D projections of the molecule with a certain orientation that needs to be determined. 2-D classification will group particles with same projection and enhance the SNR by aligning and averaging. Because the correctly 2-D classified images are greatly signal enhanced, structural details such as protein secondary structures can be revealed. Heterogeneous or radiation-damaged samples will never converge to produce good 2-D averages. Additionally, several different views (orientations) of your molecule (2-D class averages) will be developed (no orientation preference). Again, molecular shape and orientation has a big effect on this step, as more distinct structural features will lead to better alignments. If 2-D classification does not converge to reveal helices with default parameters, additional parameter screening, or more strict particle re-picking or sample preparation is necessary to improve the results.
This work can be done in the BioEM Facility with Facility computational resources in the “on-the-fly” mode. In some cases, it will be possible to redo motion correction, CTF estimation and particle picking after data have been collected. However this is dependent on the Facility’s workload. The priority for resources will always be for newly collected data, and so post-data collection reprocessing will be done at the discretion of the Facility and is not guaranteed. Projects demanding greater computational power for these steps will need to be carried out on non-Facility computers.
Phase 5) 3-R reconstruction
This is the next most time-consuming and iterative process of a cryoEM-SPR project. It requires trained personnel, significant GPU enabled computational resources and a fair amount of time, all of which need to be supplied by the user. We recommend consulting with Fei Guo and other major BioEM users prior to starting a project to understand these needs and resources completely.