EM VALIDATION TASK FORCE CHARGE AND QUESTIONS
Charge to the Committee
The EM Validation Task Force (EM-VTF) is charged with advising members of EMDataBank.org on approaches to validate maps deposited to EMDB and models deposited to PDB that are obtained from cryoEM data.
Specifically, the EM-VTF is to provide recommendations for EM structure validation criteria and tools. Such validation tools can be based either on existing or proposed software; these tools would become freely available for use by individual laboratories. Validation tools will not be used by EMDB or PDB as a basis to “reject” maps or models, but rather to flag potential problems for a depositor or user to be aware of.
The assignment for the 1st meeting is to discuss assigned questions (see below), decide on priorities and develop plans for creating the recommendations. Recommendations will be assembled into "white papers" that can be used to seek support for development and implementation of the validation tools, with aim explicitly to publish a combined "EM Validation" White Paper in a journal.
The questions below were drafted by the EM-VTF organizing committee and discussion group leaders for consideration by the map and model discussion groups, and are intended to stimulate discussion. All EM-VTF members are welcome to address any of the questions.
(1) How can map accuracy (both noise level and overall “correctness”) be assessed? What statistical values provide an indication of the quality of a reconstruction? What tests can be recommended that can be used to validate image processing parameters? In modelling in general, there are 3 quality criteria of a “model”, such as a density map or an explicit structural model. There is in general an ensemble of maps (or structural models) that are consistent with raw data, thus there is the precision of the ensemble, the accuracy of a particular map (or model), and the degree of the coverage of the measured system by the map (or a model). Is there any utility in this perspective for describing the quality of a map, not only model(s) based on the map?
(2) How should map resolution be reported? Currently it is self-reported by depositors as a single value with text description, e.g., “determined by FSC at 0.5 cutoff”. Should EMDB require the complete Fourier Shell Correlation information from depositors? Are there other useful indicators/tools?
(3) What density manipulation/filtering procedures should be specified for the deposited map densities? Should any procedures be disallowed? Examples: negative density truncation, profile fitting, low or high pass filtering, B-factor sharpening, cropping, masking, FOM weighting.
(4) Would it be desirable to have a tool to validate map point group/helical symmetry and define orientation and position with respect to a standard frame?
(5) What parameters should be used to indicate reconstruction quality, noise level and signal-to-noise ratio from 3D tomogram and sub-tomogram averages?
(1) What kind of structural models (eg, atomic, coarse-grained, single models, ensembles of models) do we expect based on EM maps?
(2) What general criteria in principle describe the quality of a “model”, including an explicit macromolecular structural model based on an EM map (eg, precision, accuracy, coverage)?
(3) What can we learn from other assessment efforts (eg, X-ray, NMR, homology and ab initio protein structure modelling, SAXS, integrated modelling, EM map) and should we strive for a common language and framework across these fields, allowing us to understand the commonalities and differences, thus benefit from concepts and tools developed by a much larger community?
(4) How should the fit of an atomic model into an EM map be evaluated? Are there suitable local (e.g., per residue, per secondary-structure element, or per domain) measures? Are there suitable global fit measures? (How) would assessment vary with map resolution, or with fitting protocol (e.g., rigid-body vs. annealing)?
(5) How can we evaluate (esp. for tomograms) that a fitted model is the correct one (it could conceivably be the wrong molecule!) or that the solution is unique/optimal? Is there any measure of quality-of-fit or signal-to-noise for (a) selecting a model to fit to a feature and (b) the selected orientation/position?
(6) We know how to validate the stereochemistry/geometry of atomic models, but how should we "value" the results when applied to EM models? How should we handle cases were rigid-body placement/refinement of a high-resolution X-ray structure has been carried out, the validation statistics will say nothing about the quality of the EM model (except for clashes)? And how should we handle cases where refinement/model-building has been done (there may be thousands of outliers even in a good model just because the resolution is so low)? And (how) should we use clashes?
(7) The precise taxonomy/protein sequence of an EM sample (a) may not be known or (b) the only high-resolution structures available for fitting may be from a species homolog. How should PDB handle the following situations: model deposited with wrong sequence(s), model deposited based on homology-modelling?
(8) How are errors in EM maps and errors in models coupled? This interdependence is possibly going to be increasingly explicitly mutual (eg, via iterative map / model calculations). How to take into account the quality of maps when estimating the quality of models based on these maps?
posted 8/24/10 CL