Okay, that /is/ a strong answer: Rmeas has too many infinities for
comfort. Thanks, very instructive yet again!
phx
On 07/07/2017 18:57, James Holton wrote:
I happen to be one of those people who think Rmerge is a very useful
statistic. Not as a method of evaluating the resolution limit, which
is mathematically ridiculous, but for a host of other important
things, like evaluating the performance of data collection equipment,
and evaluating the isomorphism of different crystals, to name a few.
I like Rmerge because it is a simple statistic that has a simple
formula and has not undergone any "corrections". Corrections increase
complexity, and complexity opens the door to manipulation by the
desperate and/or misguided. For example, overzealous outlier
rejection is a common way to abuse R factors, and it is far too often
swept under the rug, sometimes without the user even knowing about
it. This is especially problematic when working in a regime where the
statistic of interest is unstable, and for R factors this is low
intensity data. Rejecting just the right "outliers" can make any R
factor look a lot better. Why would Rmeas be any more unstable than
Rmerge? Look at the formula. There is an "n-1" in the denominator,
where n is the multiplicity. So, what happens when n approaches 1 ?
What happens when n=1? This is not to say Rmerge is better than Rmeas.
In fact, I believe the latter is generally superior to the first,
unless you are working near n = 1. The sqrt(n/(n-1)) is trying to
correct for bias in the R statistic, but fighting one infinity with
another infinity is a dangerous game.
My point is that neither Rmerge nor Rmeas are easily interpreted
without knowing the multiplicity. If you see Rmeas = 10% and the
multiplicity is 10, then you know what that means. Same for Rmerge,
since at n=10 both stats have nearly the same value. But if you have
Rmeas = 45% and multiplicity = 1.05, what does that mean? Rmeas will
be only 33% if the multiplicity is rounded up to 1.1. This is what I
mean by "numerical instability", the value of the R statistic itself
becomes sensitive to small amounts of noise, and behaves more and more
like a random number generator. And if you have Rmeas = 33% and no
indication of multiplicity, it is hard to know what is going on. I
personally am a lot more comfortable seeing qualitative agreement
between Rmerge and Rmeas, because that means the numerical instability
of the multiplicity correction didn't mess anything up.
Of course, when the intensity is weak R statistics in general are not
useful. Both Rmeas and Rmerge have the sum of all intensities in the
denominator, so when the bin-wide sum approaches zero you have another
infinity to contend with. This one starts to rear its ugly head once
I/sigma drops below about 3, and this is why our ancestors always
applied a sigma cutoff before computing an R factor. Our
small-molecule colleagues still do this! They call it "R1". And it
is an excellent indicator of the overall relative error. The relative
error in the outermost bin is not meaningful, and strangely enough
nobody ever reported the outer-resolution Rmerge before 1995.
For weak signals, Correlation Coefficients are better, but for strong
signals CC pegs out at >95%, making it harder to see relative errors.
I/sigma is what we'd like to know, but the value of "sigma" is still
prone to manipulation by not just outlier rejection, but massaging the
so-called "error model". Suffice it to say, crystallographic data
contain more than one type of error. Some sources are important for
weak spots, others are important for strong spots, and still others
are only apparent in the mid-range. Some sources of error are only
important at low multiplicity, and others only manifest at high
multiplicity. There is no single number that can be used to evaluate
all aspects of data quality.
So, I remain a champion of reporting Rmerge. Not in the high-angle
bin, because that is essentially a random number, but overall Rmerge
and low-angle-bin Rmerge next to multiplicity, Rmeas, CC1/2 and other
statistics is the only way you can glean enough information about
where the errors are coming from in the data. Rmeas is a useful
addition because it helps us correct for multiplicity without having
to do math in our head. Users generally thank you for that. Rmerge,
however, has served us well for more than half a century, and I
believe Uli Arndt knew what he was doing. I hope we all know enough
about history to realize that future generations seldom thank their
ancestors for "protecting" them from information.
-James Holton
MAD Scientist
On 7/5/2017 10:36 AM, Graeme Winter wrote:
Frank,
you are asking me to remove features that I like, so I would feel
that the challenge is for you to prove that this is harmful however:
- at the minimum, I find it a useful check sum that the stats are
internally consistent (though I interpret it for lots of other
reasons too)
- it is faulty I agree, but (with caveats) still useful IMHO
Sorry for being terse, but I remain to be convinced that removing it
increases the amount of information
CC’ing BB as requested
Best wishes Graeme
On 5 Jul 2017, at 17:17, Frank von Delft
<frank.vonde...@sgc.ox.ac.uk> wrote:
You keep not answering the challenge.
It's really simple: what information does Rmerge provide that Rmeas
doesn't.
(If you answer, email to the BB.)
On 05/07/2017 16:04, graeme.win...@diamond.ac.uk wrote:
Dear Frank,
You are forcefully arguing essentially that others are wrong if we
feel an existing statistic continues to be useful, and instead
insist that it be outlawed so that we may not make use of it, just
in case someone misinterprets it.
Very well
I do however express disquiet that we as software developers feel
browbeaten to remove the output we find useful because “the
community” feel that it is obsolete.
I feel that Jacob’s short story on this thread illustrates that
educating the next generation of crystallographers to understand
what all of the numbers mean is critical, and that a numerological
approach of trying to optimise any one statistic is essentially
doomed. Precisely the same argument could be made for people
cutting the “resolution” at the wrong place in order to improve the
average I/sig(I) of the data set.
Denying access to information is not a solution to
misinterpretation, from where I am sat, however I acknowledge that
other points of view exist.
Best wishes Graeme
On 5 Jul 2017, at 12:11, Frank von Delft
<frank.vonde...@sgc.ox.ac.uk<mailto:frank.vonde...@sgc.ox.ac.uk>>
wrote:
Graeme, Andrew
Jacob is not arguing against an R-based statistic; he's pointing
out that leaving out the multiplicity-weighting is prehistoric
(Diederichs & Karplus published it 20 years ago!).
So indeed: Rmerge, Rpim and I/sigI give different information.
As you say.
But no: Rmerge and Rmeas and Rcryst do NOT give different
information. Except:
* Rmerge is a (potentially) misleading version of Rmeas.
* Rcryst and Rmerge and Rsym are terms that no longer have
significance in the single cryo-dataset world.
phx.
On 05/07/2017 09:43, Andrew Leslie wrote:
I would like to support Graeme in his wish to retain Rmerge in
Table 1, essentially for exactly the same reasons.
I also strongly support Francis Reyes comment about the usefulness
of Rmerge at low resolution, and I would add to his list that it
can also, in some circumstances, be more indicative of the wrong
choice of symmetry (too high) than the statistics that come from
POINTLESS (excellent though that program is!).
Andrew
On 5 Jul 2017, at 05:44, Graeme Winter
<graeme.win...@gmail.com<mailto:graeme.win...@gmail.com>> wrote:
HI Jacob
Yes, I got this - and I appreciate the benefit of Rmeas for dealing
with measuring agreement for small-multiplicity observations.
Having this *as well* is very useful and I agree Rmeas / Rpim /
CC-half should be the primary “quality” statistics.
However, you asked if there is any reason to *keep* rather than
*eliminate* Rmerge, and I offered one :o)
I do not see what harm there is reporting Rmerge, even if it is
just used in the inner shell or just used to capture a flavour of
the data set overall. I also appreciate that Rmeas converges to the
same value for large multiplicity i.e.:
Overall InnerShell
OuterShell
Low resolution limit 39.02 39.02 1.39
High resolution limit 1.35 6.04 1.35
Rmerge (within I+/I-) 0.080 0.057 2.871
Rmerge (all I+ and I-) 0.081 0.059 2.922
Rmeas (within I+/I-) 0.081 0.058 2.940
Rmeas (all I+ & I-) 0.082 0.059 2.958
Rpim (within I+/I-) 0.013 0.009 0.628
Rpim (all I+ & I-) 0.009 0.007 0.453
Rmerge in top intensity bin 0.050 - -
Total number of observations 1265512 16212 53490
Total number unique 17515 224 1280
Mean((I)/sd(I)) 29.7 104.3 1.5
Mn(I) half-set correlation CC(1/2) 1.000 1.000 0.778
Completeness 100.0 99.7 100.0
Multiplicity 72.3 72.4 41.8
Anomalous completeness 100.0 100.0 100.0
Anomalous multiplicity 37.2 42.7 21.0
DelAnom correlation between half-sets 0.497 0.766 -0.026
Mid-Slope of Anom Normal Probability 1.039 - -
(this is a good case for Rpim & CC-half as resolution limit criteria)
If the statistics you want to use are there & some others also,
what is the pressure to remove them? Surely we want to educate on
how best to interpret the entire table above to get a fuller
picture of the overall quality of the data? My 0th-order request
would be to publish the three shells as above ;o)
Cheers Graeme
On 4 Jul 2017, at 22:09, Keller, Jacob
<kell...@janelia.hhmi.org<mailto:kell...@janelia.hhmi.org>> wrote:
I suggested replacing Rmerge/sym/cryst with Rmeas, not Rpim. Rmeas
is simply (Rmerge * sqrt(n/n-1)) where n is the number of
measurements of that reflection. It's merely a way of correcting
for the multiplicity-related artifact of Rmerge, which is becoming
even more of a problem with data sets of increasing variability in
multiplicity. Consider the case of comparing a data set with a
multiplicity of 2 versus one of 100: equivalent data quality would
yield Rmerges diverging by a factor of ~1.4. But this has all been
covered before in several papers. It can be and is reported in
resolution bins, so can used exactly as you say. So, why not
"disappear" Rmerge from the software?
The only reason I could come up with for keeping it is historical
reasons or comparisons to previous datasets, but anyway those
comparisons would be confounded by variabities in multiplicity and
a hundred other things, so come on, developers, just comment it out!
JPK
-----Original Message-----
From:
graeme.win...@diamond.ac.uk<mailto:graeme.win...@diamond.ac.uk>
[mailto:graeme.win...@diamond.ac.uk]
Sent: Tuesday, July 04, 2017 4:37 PM
To: Keller, Jacob
<kell...@janelia.hhmi.org<mailto:kell...@janelia.hhmi.org>>
Cc: ccp4bb@jiscmail.ac.uk<mailto:ccp4bb@jiscmail.ac.uk>
Subject: Re: [ccp4bb] Rmergicide Through Programming
HI Jacob
Unbiased estimate of the true unmerged I/sig(I) of your data (I
find this particularly useful at low resolution) i.e. if your inner
shell Rmerge is 10% your data agree very poorly; if 2% says your
data agree very well provided you have sensible multiplicity…
obviously depends on sensible interpretation. Rpim hides this
(though tells you more about the quality of average measurement)
Essentially, for I/sig(I) you can (by and large) adjust your sig(I)
values however you like if you were so inclined. You can only
adjust Rmerge by excluding measurements.
I would therefore defend that - amongst the other stats you
enumerate below - it still has a place
Cheers Graeme
On 4 Jul 2017, at 14:10, Keller, Jacob
<kell...@janelia.hhmi.org<mailto:kell...@janelia.hhmi.org>> wrote:
Rmerge does contain information which complements the others.
What information? I was trying to think of a counterargument to
what I proposed, but could not think of a reason in the world to
keep reporting it.
JPK
On 4 Jul 2017, at 12:00, Keller, Jacob
<kell...@janelia.hhmi.org<mailto:kell...@janelia.hhmi.org><mailto:kell...@janelia.hhmi.org>>
wrote:
Dear Crystallographers,
Having been repeatedly chagrinned about the continued use and
reporting of Rmerge rather than Rmeas or similar, I thought of a
potential way to promote the change: what if merging programs would
completely omit Rmerge/cryst/sym? Is there some reason to continue
to report these stats, or are they just grandfathered into the
software? I doubt that any journal or crystallographer would insist
on reporting Rmerge per se. So, I wonder what developers would
think about commenting out a few lines of their code, seeing what
happens? Maybe a comment to the effect of "Rmerge is now
deprecated; use Rmeas" would be useful as well. Would something
catastrophic happen?
All the best,
Jacob Keller
*******************************************
Jacob Pearson Keller, PhD
Research Scientist
HHMI Janelia Research Campus / Looger lab
Phone: (571)209-4000 x3159
Email:
kell...@janelia.hhmi.org<mailto:kell...@janelia.hhmi.org><mailto:kell...@janelia.hhmi.org>
*******************************************
--
This e-mail and any attachments may contain confidential, copyright
and or privileged material, and are for the use of the intended
addressee only. If you are not the intended addressee or an
authorised recipient of the addressee please notify us of receipt
by returning the e-mail and do not use, copy, retain, distribute or
disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the
individual and not necessarily of Diamond Light Source Ltd.
Diamond Light Source Ltd. cannot guarantee that this e-mail or any
attachments are free from viruses and we cannot accept liability
for any damage which you may sustain as a result of software
viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in
England and Wales with its registered office at Diamond House,
Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11
0DE, United Kingdom