Re: [ccp4bb] Reasoning for Rmeas or Rpim as Cutoff

Jacob Keller Mon, 30 Jan 2012 07:11:34 -0800

Somebody sent this to me after a previous post a while back--a sort of
case-study:


Wang, J. (2010). Inclusion of weak high-resolution X-ray data for
improvement of a group II intron structure. Acta crystallographica
Section D, Biological crystallography 66, 988-1000.

JPK




On Mon, Jan 30, 2012 at 4:03 AM, Frank von Delft
<frank.vonde...@sgc.ox.ac.uk> wrote:
> Hi Randy - thank you for a very interesting reminder to old literature.
>
> I'm intrigued:  how come this apparently excellent idea has not become
> standard best practice in the 14 years since it was published?
>
> phx
>
>
>
> On 30/01/2012 09:40, Randy Read wrote:
>
> Hi,
>
> Here are a couple of links on the idea of judging resolution by a type of
> cross-validation with data not used in refinement:
>
> Ling et al, 1998: http://pubs.acs.org/doi/full/10.1021/bi971806n
> Brunger et al,
> 2008: http://journals.iucr.org/d/issues/2009/02/00/ba5131/index.html
>   (cites earlier relevant papers from Brunger's group)
>
> Best wishes,
>
> Randy Read
>
> On 30 Jan 2012, at 07:09, arka chakraborty wrote:
>
> Hi all,
>
> In the context of the above going discussion can anybody post links for a
> few relevant articles?
>
> Thanks in advance,
>
> ARKO
>
> On Mon, Jan 30, 2012 at 3:05 AM, Randy Read <rj...@cam.ac.uk> wrote:
>>
>> Just one thing to add to that very detailed response from Ian.
>>
>> We've tended to use a slightly different approach to determining a
>> sensible resolution cutoff, where we judge whether there's useful
>> information in the highest resolution data by whether it agrees with
>> calculated structure factors computed from a model that hasn't been refined
>> against those data.  We first did this with the complex of the Shiga-like
>> toxin B-subunit pentamer with the Gb3 trisaccharide (Ling et al, 1998).
>>  From memory, the point where the average I/sig(I) drops below 2 was around
>> 3.3A.  However, we had a good molecular replacement model to solve this
>> structure and, after just carrying out rigid-body refinement, we computed a
>> SigmaA plot using data to the edge of the detector (somewhere around 2.7A,
>> again from memory).  The SigmaA plot dropped off smoothly to 2.8A
>> resolution, with values well above zero (indicating significantly better
>> than random agreement), then dropped suddenly.  So we chose 2.8A as the
>> cutoff.  Because there were four pentamers in the asymmetric unit, we could
>> then use 20-fold NCS averaging, which gave a fantastic map.  In this case,
>> the averaging certainly helped to pull out something very useful from a very
>> weak signal, because the maps weren't nearly as clear at lower resolution.
>>
>> Since then, a number of other people have applied similar tests.  Notably,
>> Axel Brunger has done some careful analysis to show that it can indeed be
>> useful to take data beyond the conventional limits.
>>
>> When you don't have a great MR model, you can do something similar by
>> limiting the resolution for the initial refinement and rebuilding, then
>> assessing whether there's useful information at higher resolution by using
>> the improved model (which hasn't seen the higher resolution data) to compute
>> Fcalcs.  By the way, it's not necessary to use a SigmaA plot -- the
>> correlation between Fo and Fc probably works just as well.  Note that, when
>> the model has been refined against the lower resolution data, you'll expect
>> a drop in correlation at the resolution cutoff you used for refinement,
>> unless you only use the cross-validation data for the resolution range used
>> in refinement.
>>
>> -----
>> Randy J. Read
>> Department of Haematology, University of Cambridge
>> Cambridge Institute for Medical Research    Tel: +44 1223 336500
>> Wellcome Trust/MRC Building                         Fax: +44 1223 336827
>> Hills Road
>>  E-mail: rj...@cam.ac.uk
>> Cambridge CB2 0XY, U.K.
>> www-structmed.cimr.cam.ac.uk
>>
>> On 29 Jan 2012, at 17:25, Ian Tickle wrote:
>>
>> > Jacob, here's my (personal) take on this:
>> >
>> > The data quality metrics that everyone uses clearly fall into 2
>> > classes: 'consistency' metrics, i.e. Rmerge/meas/pim and CC(1/2) which
>> > measure how well redundant observations agree, and signal/noise ratio
>> > metrics, i.e. mean(I/sigma) and completeness, which relate to the
>> > information content of the data.
>> >
>> > IMO the basic problem with all the consistency metrics is that they
>> > are not measuring the quantity that is relevant to refinement and
>> > electron density maps, namely the information content of the data, at
>> > least not in a direct and meaningful way.  This is because there are 2
>> > contributors to any consistency metric: the systematic errors (e.g.
>> > differences in illuminated volume and absorption) and the random
>> > errors (from counting statistics, detector noise etc.).  If the data
>> > are collected with sufficient redundancy the systematic errors should
>> > hopefully largely cancel, and therefore only the random errors will
>> > determine the information content.  Therefore the systematic error
>> > component of the consistency measure (which I suspect is the biggest
>> > component, at least for the strong reflections) is not relevant to
>> > measuring the information content.  If the consistency measure only
>> > took into account the random error component (which it can't), then it
>> > would be essentially be a measure of information content, if only
>> > indirectly (but then why not simply use a direct measure such as the
>> > signal/noise ratio?).
>> >
>> > There are clearly at least 2 distinct problems with Rmerge, first it's
>> > including systematic errors in its measure of consistency, second it's
>> > not invariant with respect to the redundancy (and third it's useless
>> > as a statistic anyway because you can't do any significance tests on
>> > it!).  The redundancy problem is fixed to some extent with Rpim etc,
>> > but that still leaves the other problems.  It's not clear to me that
>> > CC(1/2) is any better in this respect, since (as far as I understand
>> > how it's implemented), one cannot be sure that the systematic errors
>> > will cancel for each half-dataset Imean, so it's still likely to
>> > contain a large contribution from the irrelevant systematic error
>> > component and so mislead in respect of the real data quality exactly
>> > in the same way that Rmerge/meas/pim do.  One may as well use the
>> > Rmerge between the half dataset Imeans, since there would be no
>> > redundancy effect (i.e. the redundancy would be 2 for all included
>> > reflections).
>> >
>> > I did some significance tests on CC(1/2) and I got silly results, for
>> > example it says that the significance level for the CC is ~ 0.1, but
>> > this corresponded to a huge Rmerge (200%) and a tiny mean(I/sigma)
>> > (0.4).  It seems that (without any basis in statistics whatsoever) the
>> > rule-of-thumb CC > 0.5 is what is generally used, but I would be
>> > worried that the statistics are so far divorced from the reality - it
>> > suggests that something is seriously wrong with the assumptions!
>> >
>> > Having said all that, the mean(I/sigma) metric, which on the face of
>> > it is much more closely related to the information content and
>> > therefore should be a more relevant metric than Rmerge/meas/pim &
>> > CC(1/2), is not without its own problems (which probably explains the
>> > continuing popularity of the other metrics!).  First and most obvious,
>> > it's a hostage to the estimate of sigma(I) used.  I've never been
>> > happy with inflating the counting sigmas to include effects of
>> > systematic error based on the consistency of redundant measurements,
>> > since as I indicated above if the data are collected redundantly in
>> > such a way that the systematic errors largely cancel, it implies that
>> > the systematic errors should not be included in the estimate of sigma.
>> > The fact that then the sigma(I)'s would generally be smaller (at
>> > least for the large I's), so the sample variances would be much larger
>> > than the counting variances, is irrelevant, because the former
>> > includes the systematic errors.  Also the I/sigma cut-off used would
>> > probably not need to be changed since it affects only the weakest
>> > reflections which are largely unaffected by the systematic error
>> > correction.
>> >
>> > The second problem with mean(I/sigma) is also obvious: i.e. it's a
>> > mean, and as such it's rather insensitive to the actual distribution
>> > of I/sigma(I).  For example if a shell contained a few highly
>> > significant intensities these could be overwhelmed by a large number
>> > of weak data and give an insignificant mean(I/sigma).  It seems to me
>> > that one should be considering the significance of individual
>> > reflections, not the shell averages.  Also the average will depend on
>> > the width of the resolution bin, so one will get the strange effect
>> > that the apparent resolution will depend on how one bins at the data!
>> > The assumption being made in taking the bin average is that I/sigma(I)
>> > falls off smoothly with d* but that's unlikely to be the reality.
>> >
>> > It seems to me that a chi-square statistic which takes into account
>> > the actual distribution of I/sigma(I) would be a better bet than the
>> > bin average, though it's not entirely clear how one would formulate
>> > such a metric.  One would have to consider subsets of the data as a
>> > whole sorted by increasing d* (i.e. not in resolution bins to avoid
>> > the 'bin averaging effect' described above), and apply the resolution
>> > cut-off where the chi-square statistic has maximum probability.  This
>> > would automatically take care of incompleteness effects since all
>> > unmeasured reflections would be included with I/sigma = 0 just for the
>> > purposes of working out the cut-off point.  I've skipped the details
>> > of implementation and I've no idea how it would work in practice!
>> >
>> > An obvious question is: do we really need to worry about the exact
>> > cut-off anyway, won't our sophisticated maximum likelihood refinement
>> > programs handle the weak data correctly?  Note that in theory weak
>> > intensities should be handled correctly, however the problem may
>> > instead lie with incorrectly estimated sigmas: these are obviously
>> > much more of an issue for any software which depends critically on
>> > accurate estimates of uncertainty!  I did some tests where I refined
>> > data for a known protein-ligand complex using the original apo model,
>> > and looked at the difference density for the ligand, using data cut at
>> > 2.5, 2 and 1.5 Ang where the standard metrics strongly suggested there
>> > was only data to 2.5 Ang.
>> >
>> > I have to say that the differences were tiny, well below what I would
>> > deem significant (i.e. not only the map resolutions but all the map
>> > details were essentially the same), and certainly I would question
>> > whether it was worth all the soul-searching on this topic over the
>> > years!  So it seems that the refinement programs do indeed handle weak
>> > data correctly, but I guess this should hardly come as a surprise (but
>> > well done to the software developers anyway!).  This was actually
>> > using Buster: Refmac seems to have more of a problem with scaling &
>> > TLS if you include a load of high resolution junk data.  However,
>> > before anyone acts on this information I would _very_ strongly advise
>> > them to repeat the experiment and verify the results for themselves!
>> > The bottom line may be that the actual cut-off used only matters for
>> > the purpose of quoting the true resolution of the map, but it doesn't
>> > significantly affect the appearance of the map itself.
>> >
>> > Finally an effect which confounds all the quality metrics is data
>> > anisotropy: ideally the cut-off surface of significance in reciprocal
>> > space should perhaps be an ellipsoid, not a sphere.  I know there are
>> > several programs for anisotropic scaling, but I'm not aware of any
>> > that apply anisotropic resolution cutoffs (or even whether this would
>> > be advisable).
>> >
>> > Cheers
>> >
>> > -- Ian
>> >
>> > On 27 January 2012 17:47, Jacob Keller <j-kell...@fsm.northwestern.edu>
>> > wrote:
>> >> Dear Crystallographers,
>> >>
>> >> I cannot think why any of the various flavors of Rmerge/meas/pim
>> >> should be used as a data cutoff and not simply I/sigma--can somebody
>> >> make a good argument or point me to a good reference? My thinking is
>> >> that signal:noise of >2 is definitely still signal, no matter what the
>> >> R values are. Am I wrong? I was thinking also possibly the R value
>> >> cutoff was a historical accident/expedient from when one tried to
>> >> limit the amount of data in the face of limited computational
>> >> power--true? So perhaps now, when the computers are so much more
>> >> powerful, we have the luxury of including more weak data?
>> >>
>> >> JPK
>> >>
>> >>
>> >> --
>> >> *******************************************
>> >> Jacob Pearson Keller
>> >> Northwestern University
>> >> Medical Scientist Training Program
>> >> email: j-kell...@northwestern.edu
>> >> *******************************************
>
>
>
>
> --
>
> ARKA CHAKRABORTY
> CAS in Crystallography and Biophysics
> University of Madras
> Chennai,India
>
>
> ------
> Randy J. Read
> Department of Haematology, University of Cambridge
> Cambridge Institute for Medical Research      Tel: + 44 1223 336500
> Wellcome Trust/MRC Building                   Fax: + 44 1223 336827
> Hills Road                                    E-mail: rj...@cam.ac.uk
> Cambridge CB2 0XY, U.K.                       www-structmed.cimr.cam.ac.uk
>



-- 
*******************************************
Jacob Pearson Keller
Northwestern University
Medical Scientist Training Program
email: j-kell...@northwestern.edu
*******************************************

Re: [ccp4bb] Reasoning for Rmeas or Rpim as Cutoff

Reply via email to