On Mar 7, 2009, at 14:51, Gerard Bricogne wrote:
Thank you for your comments on this topic. I think, however,
that no
amount of format extension or sanity checking will ever replace the
ultimate
sanity of depositing the images themselves. This would eliminate
the many
Depositing the images would certainly be the best solution in the
long run. But I don't expect it to happen soon: the software
infrastructure isn't there yet, and I suppose most scientists' minds
aren't quite ready yet either.
In the meantime, the PDB could improve the quality of its structure
factor files with little effort by using better sanity checking. Here
are two examples I stumbled over recently and which would have been
very easy to catch with straightforward verification tools:
1) PDB entry 2PL8
Lines 1222ff of the structure factor file are:
1 1 1 5 3 0 6339.16 761.27 o
1 1 1 5 3 -1 6810.46 580.22 o
1 1 1 5 3 -2 2976.58 253.95 f
1 1 1 5 3 -312354 0.85 1051.53 o
1 1 1 5 3 -4 5875.30 500.59 o
This looks like a basic mmCIF conversion mistake, which a simple
sanity check on the Miller indices would have detected.
2) PDB entry 2P2O
The PDB file says:
REMARK 200 NUMBER OF UNIQUE REFLECTIONS : 107292
REMARK 200 RESOLUTION RANGE HIGH (A) : 1.740
REMARK 200 RESOLUTION RANGE LOW (A) : 50.000
REMARK 200 REJECTION CRITERIA (SIGMA(I)) : 1.000
REMARK 200
REMARK 200 OVERALL.
REMARK 200 COMPLETENESS FOR RANGE (%) : 92.3
REMARK 200 DATA REDUNDANCY : 3.200
REMARK 200 R MERGE (I) : NULL
REMARK 200 R SYM (I) : 0.04700
REMARK 200 <I/SIGMA(I)> FOR THE DATA SET : 15.0000
But the structure factor file doesn't agree on the number of
reflections:
_reflns.number_all 116714
_reflns.number_obs 7733
A look at the reflections tells a bit more about the discrepancy:
1 1 1 -38 0 3
x ? ? ? ? ? ?
? ?
1 1 1 -38 0 4
x ? ? ? ? ? ?
? ?
1 1 1 -38 0 5
x ? ? ? ? ? ?
? ?
1 1 1 -38 0 6
x ? ? ? ? ? ?
? ?
All but 7733 of the 116714 listed reflections have status x and no
data. It is hard to say whether this is due to a mistake or due to
the wish to deposit a structure factor file without actually
revealing any data, but in any case a simple sanity check would have
detected the discrepancy.
be given top priority. Any half-way house that would pin its hopes
on more
massaging of reduced data would seem to me pure procrastination, as
what is
accepted as "reduced" today will be seen as "massacred" tomorrow.
I think the best way would be a firm decision to go for depositing
images within a well-defined time frame while at the same time
improving the verification of deposited structure factor data.
Konrad.
--
---------------------------------------------------------------------
Konrad Hinsen
Centre de Biophysique Moléculaire, CNRS Orléans
Synchrotron Soleil - Division Expériences
Saint Aubin - BP 48
91192 Gif sur Yvette Cedex, France
Tel. +33-1 69 35 97 15
E-Mail: hin...@cnrs-orleans.fr
Web: http://dirac.cnrs-orleans.fr/~hinsen/
---------------------------------------------------------------------