Jacob, here's my (personal) take on this:

The data quality metrics that everyone uses clearly fall into 2
classes: 'consistency' metrics, i.e. Rmerge/meas/pim and CC(1/2) which
measure how well redundant observations agree, and signal/noise ratio
metrics, i.e. mean(I/sigma) and completeness, which relate to the
information content of the data.

IMO the basic problem with all the consistency metrics is that they
are not measuring the quantity that is relevant to refinement and
electron density maps, namely the information content of the data, at
least not in a direct and meaningful way.  This is because there are 2
contributors to any consistency metric: the systematic errors (e.g.
differences in illuminated volume and absorption) and the random
errors (from counting statistics, detector noise etc.).  If the data
are collected with sufficient redundancy the systematic errors should
hopefully largely cancel, and therefore only the random errors will
determine the information content.  Therefore the systematic error
component of the consistency measure (which I suspect is the biggest
component, at least for the strong reflections) is not relevant to
measuring the information content.  If the consistency measure only
took into account the random error component (which it can't), then it
would be essentially be a measure of information content, if only
indirectly (but then why not simply use a direct measure such as the
signal/noise ratio?).

There are clearly at least 2 distinct problems with Rmerge, first it's
including systematic errors in its measure of consistency, second it's
not invariant with respect to the redundancy (and third it's useless
as a statistic anyway because you can't do any significance tests on
it!).  The redundancy problem is fixed to some extent with Rpim etc,
but that still leaves the other problems.  It's not clear to me that
CC(1/2) is any better in this respect, since (as far as I understand
how it's implemented), one cannot be sure that the systematic errors
will cancel for each half-dataset Imean, so it's still likely to
contain a large contribution from the irrelevant systematic error
component and so mislead in respect of the real data quality exactly
in the same way that Rmerge/meas/pim do.  One may as well use the
Rmerge between the half dataset Imeans, since there would be no
redundancy effect (i.e. the redundancy would be 2 for all included
reflections).

I did some significance tests on CC(1/2) and I got silly results, for
example it says that the significance level for the CC is ~ 0.1, but
this corresponded to a huge Rmerge (200%) and a tiny mean(I/sigma)
(0.4).  It seems that (without any basis in statistics whatsoever) the
rule-of-thumb CC > 0.5 is what is generally used, but I would be
worried that the statistics are so far divorced from the reality - it
suggests that something is seriously wrong with the assumptions!

Having said all that, the mean(I/sigma) metric, which on the face of
it is much more closely related to the information content and
therefore should be a more relevant metric than Rmerge/meas/pim &
CC(1/2), is not without its own problems (which probably explains the
continuing popularity of the other metrics!).  First and most obvious,
it's a hostage to the estimate of sigma(I) used.  I've never been
happy with inflating the counting sigmas to include effects of
systematic error based on the consistency of redundant measurements,
since as I indicated above if the data are collected redundantly in
such a way that the systematic errors largely cancel, it implies that
the systematic errors should not be included in the estimate of sigma.
 The fact that then the sigma(I)'s would generally be smaller (at
least for the large I's), so the sample variances would be much larger
than the counting variances, is irrelevant, because the former
includes the systematic errors.  Also the I/sigma cut-off used would
probably not need to be changed since it affects only the weakest
reflections which are largely unaffected by the systematic error
correction.

The second problem with mean(I/sigma) is also obvious: i.e. it's a
mean, and as such it's rather insensitive to the actual distribution
of I/sigma(I).  For example if a shell contained a few highly
significant intensities these could be overwhelmed by a large number
of weak data and give an insignificant mean(I/sigma).  It seems to me
that one should be considering the significance of individual
reflections, not the shell averages.  Also the average will depend on
the width of the resolution bin, so one will get the strange effect
that the apparent resolution will depend on how one bins at the data!
The assumption being made in taking the bin average is that I/sigma(I)
falls off smoothly with d* but that's unlikely to be the reality.

It seems to me that a chi-square statistic which takes into account
the actual distribution of I/sigma(I) would be a better bet than the
bin average, though it's not entirely clear how one would formulate
such a metric.  One would have to consider subsets of the data as a
whole sorted by increasing d* (i.e. not in resolution bins to avoid
the 'bin averaging effect' described above), and apply the resolution
cut-off where the chi-square statistic has maximum probability.  This
would automatically take care of incompleteness effects since all
unmeasured reflections would be included with I/sigma = 0 just for the
purposes of working out the cut-off point.  I've skipped the details
of implementation and I've no idea how it would work in practice!

An obvious question is: do we really need to worry about the exact
cut-off anyway, won't our sophisticated maximum likelihood refinement
programs handle the weak data correctly?  Note that in theory weak
intensities should be handled correctly, however the problem may
instead lie with incorrectly estimated sigmas: these are obviously
much more of an issue for any software which depends critically on
accurate estimates of uncertainty!  I did some tests where I refined
data for a known protein-ligand complex using the original apo model,
and looked at the difference density for the ligand, using data cut at
2.5, 2 and 1.5 Ang where the standard metrics strongly suggested there
was only data to 2.5 Ang.

I have to say that the differences were tiny, well below what I would
deem significant (i.e. not only the map resolutions but all the map
details were essentially the same), and certainly I would question
whether it was worth all the soul-searching on this topic over the
years!  So it seems that the refinement programs do indeed handle weak
data correctly, but I guess this should hardly come as a surprise (but
well done to the software developers anyway!).  This was actually
using Buster: Refmac seems to have more of a problem with scaling &
TLS if you include a load of high resolution junk data.  However,
before anyone acts on this information I would _very_ strongly advise
them to repeat the experiment and verify the results for themselves!
The bottom line may be that the actual cut-off used only matters for
the purpose of quoting the true resolution of the map, but it doesn't
significantly affect the appearance of the map itself.

Finally an effect which confounds all the quality metrics is data
anisotropy: ideally the cut-off surface of significance in reciprocal
space should perhaps be an ellipsoid, not a sphere.  I know there are
several programs for anisotropic scaling, but I'm not aware of any
that apply anisotropic resolution cutoffs (or even whether this would
be advisable).

Cheers

-- Ian

On 27 January 2012 17:47, Jacob Keller <j-kell...@fsm.northwestern.edu> wrote:
> Dear Crystallographers,
>
> I cannot think why any of the various flavors of Rmerge/meas/pim
> should be used as a data cutoff and not simply I/sigma--can somebody
> make a good argument or point me to a good reference? My thinking is
> that signal:noise of >2 is definitely still signal, no matter what the
> R values are. Am I wrong? I was thinking also possibly the R value
> cutoff was a historical accident/expedient from when one tried to
> limit the amount of data in the face of limited computational
> power--true? So perhaps now, when the computers are so much more
> powerful, we have the luxury of including more weak data?
>
> JPK
>
>
> --
> *******************************************
> Jacob Pearson Keller
> Northwestern University
> Medical Scientist Training Program
> email: j-kell...@northwestern.edu
> *******************************************

Reply via email to