Dear Phil,
Your observation that the refinement details in PDB format REMARKs
are difficult to interpret and compare is well taken. Each refinement
package produces its own set of refinement results calculated in its
own way. Both the calculation and presentation of this information in
PDB format differs between programs, and even between program
versions. The lack of standardization in how refinement information
is reported is confusing to many PDB users.
In the spirit of supporting innovation, the PDB has historically
tried to accommodate this diversity by providing program- and version-
specific REMARK 3 formats. However, the field of structural biology
has matured considerably in the past few decades, and time-tested,
consensus, and best-practice approaches can now be defined in many
cases. In our view, adopting such approaches (rather than
accomodating every variant ever implemented) would be the best way to
serve the interests of both non-expert user communities and the
experimental structural biology community.
As an illustration, it is interesting to note that there are at least
20 different types of R-values reported in the current archive. The
subtle differences in these quantities may be of interest in
understanding the evolution of refinement methodology. However, we
believe that a smaller, common set of well-defined data items
describing refinement results would be more useful to the broader
community of PDB users.
To this end, the wwPDB maintains an Exchange Data Dictionary of
community-vetted definitions and examples of each data item in the
PDB archive. This is an extensible dictionary that grows with new
technologies and science. For instance, wwPDB has used this
extensibility to capture and define all the various R-values. While
the dictionary technology provides a framework for definition and
standardization, this only addresses part of the problem.
Even though we have precise definitions for the wide range of R-value
types, R-value comparisons between entries is still complicated
because the values are not uniformly populated across the archive. To
fully address the problem, we not only need the standardization
provided by the dictionary technology but also the cooperation of the
software package developers in producing a common set of statistics
and diagnostics. This does not preclude reporting new and novel data
items, but these should be provided in addition to a common core of
data results.
Further information about the PDB Exchange Data Dictionary can be
found at our dictionary resource site, http://mmcif.pdb.org/
Correspondence information between our PDB Exchange Data Dictionary
and items in the current PDB format is also available at
http://mmcif.pdb.org/dictionaries/pdb-correspondence/pdb2mmcif-2010.html
Sincerely,
Christine Zardecki
for the wwPDB
From: Phil Jeffrey <pjeff...@princeton.edu>
Date: May 19, 2010 4:02:22 PM EDT
To: CCP4BB@JISCMAIL.AC.UK
Subject: [ccp4bb] Question: Refmac5 stats reported in pdb REMARK 3
Reply-To: Phil Jeffrey <pjeff...@princeton.edu>
Compare these two lines from phenix.refine:
REMARK 3 NUMBER OF REFLECTIONS : 46001
REMARK 3 FREE R VALUE TEST SET COUNT : 2339
with those from refmac, ostensibly using the same data and start pdb:
REMARK 3 NUMBER OF REFLECTIONS : 43672
REMARK 3 FREE R VALUE TEST SET COUNT : 2339
I know there are 46011 reflections with |F|>0 in the files I used.
phenix.refine removes 10 of these as outliers. The 46001 remaining
reported in REMARK 3 *include* the test set.
With REFMAC, 43672+2339=46011 so it appears that Refmac reports
just the *working* set count in that first line, excluding the test
set.
Is this is a bug with one program or the other, or a bug in the PDB
definition of REMARK 3 ? http://www.wwpdb.org/documentation/
format23/remark3.html
This appears to be a source of inconsistency.
phenix.refine 1.6-289
refmac5 5.4.0077 (I'm apparently a Luddite)
Phil Jeffrey
Princeton