Re: [ccp4bb] Question: Refmac5 stats reported in pdb REMARK 3

Christine Zardecki Fri, 21 May 2010 06:17:30 -0700

Dear Phil,

Your observation that the refinement details in PDB format REMARKsare difficult to interpret and compare is well taken. Each refinementpackage produces its own set of refinement results calculated in itsown way. Both the calculation and presentation of this information inPDB format differs between programs, and even between programversions. The lack of standardization in how refinement informationis reported is confusing to many PDB users.

In the spirit of supporting innovation, the PDB has historicallytried to accommodate this diversity by providing program- and version-specific REMARK 3 formats. However, the field of structural biologyhas matured considerably in the past few decades, and time-tested,consensus, and best-practice approaches can now be defined in manycases. In our view, adopting such approaches (rather thanaccomodating every variant ever implemented) would be the best way toserve the interests of both non-expert user communities and theexperimental structural biology community.

As an illustration, it is interesting to note that there are at least20 different types of R-values reported in the current archive. Thesubtle differences in these quantities may be of interest inunderstanding the evolution of refinement methodology. However, webelieve that a smaller, common set of well-defined data itemsdescribing refinement results would be more useful to the broadercommunity of PDB users.

To this end, the wwPDB maintains an Exchange Data Dictionary ofcommunity-vetted definitions and examples of each data item in thePDB archive. This is an extensible dictionary that grows with newtechnologies and science. For instance, wwPDB has used thisextensibility to capture and define all the various R-values. Whilethe dictionary technology provides a framework for definition andstandardization, this only addresses part of the problem.

Even though we have precise definitions for the wide range of R-valuetypes, R-value comparisons between entries is still complicatedbecause the values are not uniformly populated across the archive. Tofully address the problem, we not only need the standardizationprovided by the dictionary technology but also the cooperation of thesoftware package developers in producing a common set of statisticsand diagnostics. This does not preclude reporting new and novel dataitems, but these should be provided in addition to a common core ofdata results.

Further information about the PDB Exchange Data Dictionary can befound at our dictionary resource site, http://mmcif.pdb.org/

Correspondence information between our PDB Exchange Data Dictionaryand items in the current PDB format is also available at

http://mmcif.pdb.org/dictionaries/pdb-correspondence/pdb2mmcif-2010.html

Sincerely,

Christine Zardecki
for the wwPDB

From: Phil Jeffrey <pjeff...@princeton.edu>
Date: May 19, 2010 4:02:22 PM EDT
To: CCP4BB@JISCMAIL.AC.UK
Subject: [ccp4bb] Question: Refmac5 stats reported in pdb REMARK 3
Reply-To: Phil Jeffrey <pjeff...@princeton.edu>



Compare these two lines from phenix.refine:
REMARK   3   NUMBER OF REFLECTIONS             : 46001
REMARK   3   FREE R VALUE TEST SET COUNT      : 2339

with those from refmac, ostensibly using the same data and start pdb:
REMARK   3   NUMBER OF REFLECTIONS             :   43672
REMARK   3   FREE R VALUE TEST SET COUNT      :  2339


I know there are 46011 reflections with |F|>0 in the files I used.
phenix.refine removes 10 of these as outliers. The 46001 remainingreported in REMARK 3 *include* the test set.
With REFMAC, 43672+2339=46011 so it appears that Refmac reportsjust the *working* set count in that first line, excluding the testset.
Is this is a bug with one program or the other, or a bug in the PDBdefinition of REMARK 3 ? http://www.wwpdb.org/documentation/format23/remark3.html
This appears to be a source of inconsistency.

phenix.refine 1.6-289
refmac5 5.4.0077      (I'm apparently a Luddite)

Phil Jeffrey
Princeton

Re: [ccp4bb] Question: Refmac5 stats reported in pdb REMARK 3

Reply via email to