Hi Randy - thank you for a very interesting reminder to old literature.

I'm intrigued: how come this apparently excellent idea has not become standard best practice in the 14 years since it was published?

phx


On 30/01/2012 09:40, Randy Read wrote:
Hi,

Here are a couple of links on the idea of judging resolution by a type of cross-validation with data not used in refinement:

Ling et al, 1998: http://pubs.acs.org/doi/full/10.1021/bi971806n
Brunger et al, 2008: http://journals.iucr.org/d/issues/2009/02/00/ba5131/index.html
  (cites earlier relevant papers from Brunger's group)

Best wishes,

Randy Read

On 30 Jan 2012, at 07:09, arka chakraborty wrote:

Hi all,

In the context of the above going discussion can anybody post links for a few relevant articles?

Thanks in advance,

ARKO

On Mon, Jan 30, 2012 at 3:05 AM, Randy Read <rj...@cam.ac.uk <mailto:rj...@cam.ac.uk>> wrote:

    Just one thing to add to that very detailed response from Ian.

    We've tended to use a slightly different approach to determining
    a sensible resolution cutoff, where we judge whether there's
    useful information in the highest resolution data by whether it
    agrees with calculated structure factors computed from a model
    that hasn't been refined against those data.  We first did this
    with the complex of the Shiga-like toxin B-subunit pentamer with
    the Gb3 trisaccharide (Ling et al, 1998).  From memory, the point
    where the average I/sig(I) drops below 2 was around 3.3A.
     However, we had a good molecular replacement model to solve this
    structure and, after just carrying out rigid-body refinement, we
    computed a SigmaA plot using data to the edge of the detector
    (somewhere around 2.7A, again from memory).  The SigmaA plot
    dropped off smoothly to 2.8A resolution, with values well above
    zero (indicating significantly better than random agreement),
    then dropped suddenly.  So we chose 2.8A as the cutoff.  Because
    there were four pentamers in the asymmetric unit, we could then
    use 20-fold NCS averaging, which gave a fantastic map.  In this
    case, the averaging certainly helped to pull out something very
    useful from a very weak signal, because the maps weren't nearly
    as clear at lower resolution.

    Since then, a number of other people have applied similar tests.
     Notably, Axel Brunger has done some careful analysis to show
    that it can indeed be useful to take data beyond the conventional
    limits.

    When you don't have a great MR model, you can do something
    similar by limiting the resolution for the initial refinement and
    rebuilding, then assessing whether there's useful information at
    higher resolution by using the improved model (which hasn't seen
    the higher resolution data) to compute Fcalcs.  By the way, it's
    not necessary to use a SigmaA plot -- the correlation between Fo
    and Fc probably works just as well.  Note that, when the model
    has been refined against the lower resolution data, you'll expect
    a drop in correlation at the resolution cutoff you used for
    refinement, unless you only use the cross-validation data for the
    resolution range used in refinement.

    -----
    Randy J. Read
    Department of Haematology, University of Cambridge
    Cambridge Institute for Medical Research    Tel: +44 1223 336500
    <tel:%2B44%201223%20336500>
    Wellcome Trust/MRC Building                         Fax: +44 1223
    336827 <tel:%2B44%201223%20336827>
Hills Road E-mail: rj...@cam.ac.uk <mailto:rj...@cam.ac.uk>
    Cambridge CB2 0XY, U.K. www-structmed.cimr.cam.ac.uk
    <http://www-structmed.cimr.cam.ac.uk/>

    On 29 Jan 2012, at 17:25, Ian Tickle wrote:

    > Jacob, here's my (personal) take on this:
    >
    > The data quality metrics that everyone uses clearly fall into 2
    > classes: 'consistency' metrics, i.e. Rmerge/meas/pim and
    CC(1/2) which
    > measure how well redundant observations agree, and signal/noise
    ratio
    > metrics, i.e. mean(I/sigma) and completeness, which relate to the
    > information content of the data.
    >
    > IMO the basic problem with all the consistency metrics is that they
    > are not measuring the quantity that is relevant to refinement and
    > electron density maps, namely the information content of the
    data, at
    > least not in a direct and meaningful way.  This is because
    there are 2
    > contributors to any consistency metric: the systematic errors (e.g.
    > differences in illuminated volume and absorption) and the random
    > errors (from counting statistics, detector noise etc.).  If the
    data
    > are collected with sufficient redundancy the systematic errors
    should
    > hopefully largely cancel, and therefore only the random errors will
    > determine the information content.  Therefore the systematic error
    > component of the consistency measure (which I suspect is the
    biggest
    > component, at least for the strong reflections) is not relevant to
    > measuring the information content.  If the consistency measure only
    > took into account the random error component (which it can't),
    then it
    > would be essentially be a measure of information content, if only
    > indirectly (but then why not simply use a direct measure such
    as the
    > signal/noise ratio?).
    >
    > There are clearly at least 2 distinct problems with Rmerge,
    first it's
    > including systematic errors in its measure of consistency,
    second it's
    > not invariant with respect to the redundancy (and third it's
    useless
    > as a statistic anyway because you can't do any significance
    tests on
    > it!).  The redundancy problem is fixed to some extent with Rpim
    etc,
    > but that still leaves the other problems.  It's not clear to me
    that
    > CC(1/2) is any better in this respect, since (as far as I
    understand
    > how it's implemented), one cannot be sure that the systematic
    errors
    > will cancel for each half-dataset Imean, so it's still likely to
    > contain a large contribution from the irrelevant systematic error
    > component and so mislead in respect of the real data quality
    exactly
    > in the same way that Rmerge/meas/pim do.  One may as well use the
    > Rmerge between the half dataset Imeans, since there would be no
    > redundancy effect (i.e. the redundancy would be 2 for all included
    > reflections).
    >
    > I did some significance tests on CC(1/2) and I got silly
    results, for
    > example it says that the significance level for the CC is ~
    0.1, but
    > this corresponded to a huge Rmerge (200%) and a tiny mean(I/sigma)
    > (0.4).  It seems that (without any basis in statistics
    whatsoever) the
    > rule-of-thumb CC > 0.5 is what is generally used, but I would be
    > worried that the statistics are so far divorced from the
    reality - it
    > suggests that something is seriously wrong with the assumptions!
    >
    > Having said all that, the mean(I/sigma) metric, which on the
    face of
    > it is much more closely related to the information content and
    > therefore should be a more relevant metric than Rmerge/meas/pim &
    > CC(1/2), is not without its own problems (which probably
    explains the
    > continuing popularity of the other metrics!).  First and most
    obvious,
    > it's a hostage to the estimate of sigma(I) used.  I've never been
    > happy with inflating the counting sigmas to include effects of
    > systematic error based on the consistency of redundant
    measurements,
    > since as I indicated above if the data are collected redundantly in
    > such a way that the systematic errors largely cancel, it
    implies that
    > the systematic errors should not be included in the estimate of
    sigma.
    > The fact that then the sigma(I)'s would generally be smaller (at
    > least for the large I's), so the sample variances would be much
    larger
    > than the counting variances, is irrelevant, because the former
    > includes the systematic errors.  Also the I/sigma cut-off used
    would
    > probably not need to be changed since it affects only the weakest
    > reflections which are largely unaffected by the systematic error
    > correction.
    >
    > The second problem with mean(I/sigma) is also obvious: i.e. it's a
    > mean, and as such it's rather insensitive to the actual
    distribution
    > of I/sigma(I).  For example if a shell contained a few highly
    > significant intensities these could be overwhelmed by a large
    number
    > of weak data and give an insignificant mean(I/sigma).  It seems
    to me
    > that one should be considering the significance of individual
    > reflections, not the shell averages.  Also the average will
    depend on
    > the width of the resolution bin, so one will get the strange effect
    > that the apparent resolution will depend on how one bins at the
    data!
    > The assumption being made in taking the bin average is that
    I/sigma(I)
    > falls off smoothly with d* but that's unlikely to be the reality.
    >
    > It seems to me that a chi-square statistic which takes into account
    > the actual distribution of I/sigma(I) would be a better bet
    than the
    > bin average, though it's not entirely clear how one would formulate
    > such a metric.  One would have to consider subsets of the data as a
    > whole sorted by increasing d* (i.e. not in resolution bins to avoid
    > the 'bin averaging effect' described above), and apply the
    resolution
    > cut-off where the chi-square statistic has maximum probability.
     This
    > would automatically take care of incompleteness effects since all
    > unmeasured reflections would be included with I/sigma = 0 just
    for the
    > purposes of working out the cut-off point.  I've skipped the
    details
    > of implementation and I've no idea how it would work in practice!
    >
    > An obvious question is: do we really need to worry about the exact
    > cut-off anyway, won't our sophisticated maximum likelihood
    refinement
    > programs handle the weak data correctly?  Note that in theory weak
    > intensities should be handled correctly, however the problem may
    > instead lie with incorrectly estimated sigmas: these are obviously
    > much more of an issue for any software which depends critically on
    > accurate estimates of uncertainty!  I did some tests where I
    refined
    > data for a known protein-ligand complex using the original apo
    model,
    > and looked at the difference density for the ligand, using data
    cut at
    > 2.5, 2 and 1.5 Ang where the standard metrics strongly
    suggested there
    > was only data to 2.5 Ang.
    >
    > I have to say that the differences were tiny, well below what I
    would
    > deem significant (i.e. not only the map resolutions but all the map
    > details were essentially the same), and certainly I would question
    > whether it was worth all the soul-searching on this topic over the
    > years!  So it seems that the refinement programs do indeed
    handle weak
    > data correctly, but I guess this should hardly come as a
    surprise (but
    > well done to the software developers anyway!).  This was actually
    > using Buster: Refmac seems to have more of a problem with scaling &
    > TLS if you include a load of high resolution junk data.  However,
    > before anyone acts on this information I would _very_ strongly
    advise
    > them to repeat the experiment and verify the results for
    themselves!
    > The bottom line may be that the actual cut-off used only
    matters for
    > the purpose of quoting the true resolution of the map, but it
    doesn't
    > significantly affect the appearance of the map itself.
    >
    > Finally an effect which confounds all the quality metrics is data
    > anisotropy: ideally the cut-off surface of significance in
    reciprocal
    > space should perhaps be an ellipsoid, not a sphere.  I know
    there are
    > several programs for anisotropic scaling, but I'm not aware of any
    > that apply anisotropic resolution cutoffs (or even whether this
    would
    > be advisable).
    >
    > Cheers
    >
    > -- Ian
    >
    > On 27 January 2012 17:47, Jacob Keller
    <j-kell...@fsm.northwestern.edu
    <mailto:j-kell...@fsm.northwestern.edu>> wrote:
    >> Dear Crystallographers,
    >>
    >> I cannot think why any of the various flavors of Rmerge/meas/pim
    >> should be used as a data cutoff and not simply I/sigma--can
    somebody
    >> make a good argument or point me to a good reference? My
    thinking is
    >> that signal:noise of >2 is definitely still signal, no matter
    what the
    >> R values are. Am I wrong? I was thinking also possibly the R value
    >> cutoff was a historical accident/expedient from when one tried to
    >> limit the amount of data in the face of limited computational
    >> power--true? So perhaps now, when the computers are so much more
    >> powerful, we have the luxury of including more weak data?
    >>
    >> JPK
    >>
    >>
    >> --
    >> *******************************************
    >> Jacob Pearson Keller
    >> Northwestern University
    >> Medical Scientist Training Program
    >> email: j-kell...@northwestern.edu
    <mailto:j-kell...@northwestern.edu>
    >> *******************************************




--

/ARKA CHAKRABORTY/
/CAS in Crystallography and Biophysics/
/University of Madras/
/Chennai,India/


------
Randy J. Read
Department of Haematology, University of Cambridge
Cambridge Institute for Medical Research      Tel: + 44 1223 336500
Wellcome Trust/MRC Building                   Fax: + 44 1223 336827
Hills Road E-mail: rj...@cam.ac.uk <mailto:rj...@cam.ac.uk>
Cambridge CB2 0XY, U.K.                       www-structmed.cimr.cam.ac.uk

Reply via email to