James, I cannot follow you. "n approaches 1" can only mean n = 2 because n is integer. And for n=2 the sqrt(n/(n-1)) factor is well-defined. For n=1, neither contributions to Rmeas nor Rmerge nor to any other precision indicator can be calculated anyway, because there's nothing this measurement can be compared against.
just my 2 cents, Kay On Fri, 7 Jul 2017 10:57:17 -0700, James Holton <jmhol...@slac.stanford.edu> wrote: >I happen to be one of those people who think Rmerge is a very useful >statistic. Not as a method of evaluating the resolution limit, which is >mathematically ridiculous, but for a host of other important things, >like evaluating the performance of data collection equipment, and >evaluating the isomorphism of different crystals, to name a few. > >I like Rmerge because it is a simple statistic that has a simple formula >and has not undergone any "corrections". Corrections increase >complexity, and complexity opens the door to manipulation by the >desperate and/or misguided. For example, overzealous outlier rejection >is a common way to abuse R factors, and it is far too often swept under >the rug, sometimes without the user even knowing about it. This is >especially problematic when working in a regime where the statistic of >interest is unstable, and for R factors this is low intensity data. >Rejecting just the right "outliers" can make any R factor look a lot >better. Why would Rmeas be any more unstable than Rmerge? Look at the >formula. There is an "n-1" in the denominator, where n is the >multiplicity. So, what happens when n approaches 1 ? What happens when >n=1? This is not to say Rmerge is better than Rmeas. In fact, I believe >the latter is generally superior to the first, unless you are working >near n = 1. The sqrt(n/(n-1)) is trying to correct for bias in the R >statistic, but fighting one infinity with another infinity is a >dangerous game. > >My point is that neither Rmerge nor Rmeas are easily interpreted without >knowing the multiplicity. If you see Rmeas = 10% and the multiplicity >is 10, then you know what that means. Same for Rmerge, since at n=10 >both stats have nearly the same value. But if you have Rmeas = 45% and >multiplicity = 1.05, what does that mean? Rmeas will be only 33% if the >multiplicity is rounded up to 1.1. This is what I mean by "numerical >instability", the value of the R statistic itself becomes sensitive to >small amounts of noise, and behaves more and more like a random number >generator. And if you have Rmeas = 33% and no indication of >multiplicity, it is hard to know what is going on. I personally am a >lot more comfortable seeing qualitative agreement between Rmerge and >Rmeas, because that means the numerical instability of the multiplicity >correction didn't mess anything up. > >Of course, when the intensity is weak R statistics in general are not >useful. Both Rmeas and Rmerge have the sum of all intensities in the >denominator, so when the bin-wide sum approaches zero you have another >infinity to contend with. This one starts to rear its ugly head once >I/sigma drops below about 3, and this is why our ancestors always >applied a sigma cutoff before computing an R factor. Our small-molecule >colleagues still do this! They call it "R1". And it is an excellent >indicator of the overall relative error. The relative error in the >outermost bin is not meaningful, and strangely enough nobody ever >reported the outer-resolution Rmerge before 1995. > >For weak signals, Correlation Coefficients are better, but for strong >signals CC pegs out at >95%, making it harder to see relative errors. >I/sigma is what we'd like to know, but the value of "sigma" is still >prone to manipulation by not just outlier rejection, but massaging the >so-called "error model". Suffice it to say, crystallographic data >contain more than one type of error. Some sources are important for >weak spots, others are important for strong spots, and still others are >only apparent in the mid-range. Some sources of error are only >important at low multiplicity, and others only manifest at high >multiplicity. There is no single number that can be used to evaluate all >aspects of data quality. > >So, I remain a champion of reporting Rmerge. Not in the high-angle bin, >because that is essentially a random number, but overall Rmerge and >low-angle-bin Rmerge next to multiplicity, Rmeas, CC1/2 and other >statistics is the only way you can glean enough information about where >the errors are coming from in the data. Rmeas is a useful addition >because it helps us correct for multiplicity without having to do math >in our head. Users generally thank you for that. Rmerge, however, has >served us well for more than half a century, and I believe Uli Arndt >knew what he was doing. I hope we all know enough about history to >realize that future generations seldom thank their ancestors for >"protecting" them from information. > >-James Holton >MAD Scientist > > >On 7/5/2017 10:36 AM, Graeme Winter wrote: >> Frank, >> >> you are asking me to remove features that I like, so I would feel that the >> challenge is for you to prove that this is harmful however: >> >> - at the minimum, I find it a useful check sum that the stats are >> internally consistent (though I interpret it for lots of other reasons too) >> - it is faulty I agree, but (with caveats) still useful IMHO >> >> Sorry for being terse, but I remain to be convinced that removing it >> increases the amount of information >> >> CC’ing BB as requested >> >> Best wishes Graeme >> >> >>> On 5 Jul 2017, at 17:17, Frank von Delft <frank.vonde...@sgc.ox.ac.uk> >>> wrote: >>> >>> You keep not answering the challenge. >>> >>> It's really simple: what information does Rmerge provide that Rmeas >>> doesn't. >>> >>> (If you answer, email to the BB.) >>> >>> >>> On 05/07/2017 16:04, graeme.win...@diamond.ac.uk wrote: >>>> Dear Frank, >>>> >>>> You are forcefully arguing essentially that others are wrong if we feel an >>>> existing statistic continues to be useful, and instead insist that it be >>>> outlawed so that we may not make use of it, just in case someone >>>> misinterprets it. >>>> >>>> Very well >>>> >>>> I do however express disquiet that we as software developers feel >>>> browbeaten to remove the output we find useful because “the community” >>>> feel that it is obsolete. >>>> >>>> I feel that Jacob’s short story on this thread illustrates that educating >>>> the next generation of crystallographers to understand what all of the >>>> numbers mean is critical, and that a numerological approach of trying to >>>> optimise any one statistic is essentially doomed. Precisely the same >>>> argument could be made for people cutting the “resolution” at the wrong >>>> place in order to improve the average I/sig(I) of the data set. >>>> >>>> Denying access to information is not a solution to misinterpretation, from >>>> where I am sat, however I acknowledge that other points of view exist. >>>> >>>> Best wishes Graeme >>>> >>>> >>>> On 5 Jul 2017, at 12:11, Frank von Delft >>>> <frank.vonde...@sgc.ox.ac.uk<mailto:frank.vonde...@sgc.ox.ac.uk>> wrote: >>>> >>>> >>>> Graeme, Andrew >>>> >>>> Jacob is not arguing against an R-based statistic; he's pointing out that >>>> leaving out the multiplicity-weighting is prehistoric (Diederichs & >>>> Karplus published it 20 years ago!). >>>> >>>> So indeed: Rmerge, Rpim and I/sigI give different information. As you >>>> say. >>>> >>>> But no: Rmerge and Rmeas and Rcryst do NOT give different information. >>>> Except: >>>> >>>> * Rmerge is a (potentially) misleading version of Rmeas. >>>> >>>> * Rcryst and Rmerge and Rsym are terms that no longer have significance >>>> in the single cryo-dataset world. >>>> >>>> phx. >>>> >>>> >>>> >>>> On 05/07/2017 09:43, Andrew Leslie wrote: >>>> >>>> I would like to support Graeme in his wish to retain Rmerge in Table 1, >>>> essentially for exactly the same reasons. >>>> >>>> I also strongly support Francis Reyes comment about the usefulness of >>>> Rmerge at low resolution, and I would add to his list that it can also, in >>>> some circumstances, be more indicative of the wrong choice of symmetry >>>> (too high) than the statistics that come from POINTLESS (excellent though >>>> that program is!). >>>> >>>> Andrew >>>> On 5 Jul 2017, at 05:44, Graeme Winter >>>> <graeme.win...@gmail.com<mailto:graeme.win...@gmail.com>> wrote: >>>> >>>> HI Jacob >>>> >>>> Yes, I got this - and I appreciate the benefit of Rmeas for dealing with >>>> measuring agreement for small-multiplicity observations. Having this *as >>>> well* is very useful and I agree Rmeas / Rpim / CC-half should be the >>>> primary “quality” statistics. >>>> >>>> However, you asked if there is any reason to *keep* rather than >>>> *eliminate* Rmerge, and I offered one :o) >>>> >>>> I do not see what harm there is reporting Rmerge, even if it is just used >>>> in the inner shell or just used to capture a flavour of the data set >>>> overall. I also appreciate that Rmeas converges to the same value for >>>> large multiplicity i.e.: >>>> >>>> Overall InnerShell OuterShell >>>> Low resolution limit 39.02 39.02 1.39 >>>> High resolution limit 1.35 6.04 1.35 >>>> >>>> Rmerge (within I+/I-) 0.080 0.057 2.871 >>>> Rmerge (all I+ and I-) 0.081 0.059 2.922 >>>> Rmeas (within I+/I-) 0.081 0.058 2.940 >>>> Rmeas (all I+ & I-) 0.082 0.059 2.958 >>>> Rpim (within I+/I-) 0.013 0.009 0.628 >>>> Rpim (all I+ & I-) 0.009 0.007 0.453 >>>> Rmerge in top intensity bin 0.050 - - >>>> Total number of observations 1265512 16212 53490 >>>> Total number unique 17515 224 1280 >>>> Mean((I)/sd(I)) 29.7 104.3 1.5 >>>> Mn(I) half-set correlation CC(1/2) 1.000 1.000 0.778 >>>> Completeness 100.0 99.7 100.0 >>>> Multiplicity 72.3 72.4 41.8 >>>> >>>> Anomalous completeness 100.0 100.0 100.0 >>>> Anomalous multiplicity 37.2 42.7 21.0 >>>> DelAnom correlation between half-sets 0.497 0.766 -0.026 >>>> Mid-Slope of Anom Normal Probability 1.039 - - >>>> >>>> (this is a good case for Rpim & CC-half as resolution limit criteria) >>>> >>>> If the statistics you want to use are there & some others also, what is >>>> the pressure to remove them? Surely we want to educate on how best to >>>> interpret the entire table above to get a fuller picture of the overall >>>> quality of the data? My 0th-order request would be to publish the three >>>> shells as above ;o) >>>> >>>> Cheers Graeme >>>> >>>> >>>> >>>> On 4 Jul 2017, at 22:09, Keller, Jacob >>>> <kell...@janelia.hhmi.org<mailto:kell...@janelia.hhmi.org>> wrote: >>>> >>>> I suggested replacing Rmerge/sym/cryst with Rmeas, not Rpim. Rmeas is >>>> simply (Rmerge * sqrt(n/n-1)) where n is the number of measurements of >>>> that reflection. It's merely a way of correcting for the >>>> multiplicity-related artifact of Rmerge, which is becoming even more of a >>>> problem with data sets of increasing variability in multiplicity. Consider >>>> the case of comparing a data set with a multiplicity of 2 versus one of >>>> 100: equivalent data quality would yield Rmerges diverging by a factor of >>>> ~1.4. But this has all been covered before in several papers. It can be >>>> and is reported in resolution bins, so can used exactly as you say. So, >>>> why not "disappear" Rmerge from the software? >>>> >>>> The only reason I could come up with for keeping it is historical reasons >>>> or comparisons to previous datasets, but anyway those comparisons would be >>>> confounded by variabities in multiplicity and a hundred other things, so >>>> come on, developers, just comment it out! >>>> >>>> JPK >>>> >>>> >>>> >>>> >>>> -----Original Message----- >>>> From: graeme.win...@diamond.ac.uk<mailto:graeme.win...@diamond.ac.uk> >>>> [mailto:graeme.win...@diamond.ac.uk] >>>> Sent: Tuesday, July 04, 2017 4:37 PM >>>> To: Keller, Jacob >>>> <kell...@janelia.hhmi.org<mailto:kell...@janelia.hhmi.org>> >>>> Cc: ccp4bb@jiscmail.ac.uk<mailto:ccp4bb@jiscmail.ac.uk> >>>> Subject: Re: [ccp4bb] Rmergicide Through Programming >>>> >>>> HI Jacob >>>> >>>> Unbiased estimate of the true unmerged I/sig(I) of your data (I find this >>>> particularly useful at low resolution) i.e. if your inner shell Rmerge is >>>> 10% your data agree very poorly; if 2% says your data agree very well >>>> provided you have sensible multiplicity… obviously depends on sensible >>>> interpretation. Rpim hides this (though tells you more about the quality >>>> of average measurement) >>>> >>>> Essentially, for I/sig(I) you can (by and large) adjust your sig(I) values >>>> however you like if you were so inclined. You can only adjust Rmerge by >>>> excluding measurements. >>>> >>>> I would therefore defend that - amongst the other stats you enumerate >>>> below - it still has a place >>>> >>>> Cheers Graeme >>>> >>>> On 4 Jul 2017, at 14:10, Keller, Jacob >>>> <kell...@janelia.hhmi.org<mailto:kell...@janelia.hhmi.org>> wrote: >>>> >>>> Rmerge does contain information which complements the others. >>>> >>>> What information? I was trying to think of a counterargument to what I >>>> proposed, but could not think of a reason in the world to keep reporting >>>> it. >>>> >>>> JPK >>>> >>>> >>>> On 4 Jul 2017, at 12:00, Keller, Jacob >>>> <kell...@janelia.hhmi.org<mailto:kell...@janelia.hhmi.org><mailto:kell...@janelia.hhmi.org>> >>>> wrote: >>>> >>>> Dear Crystallographers, >>>> >>>> Having been repeatedly chagrinned about the continued use and reporting of >>>> Rmerge rather than Rmeas or similar, I thought of a potential way to >>>> promote the change: what if merging programs would completely omit >>>> Rmerge/cryst/sym? Is there some reason to continue to report these stats, >>>> or are they just grandfathered into the software? I doubt that any journal >>>> or crystallographer would insist on reporting Rmerge per se. So, I wonder >>>> what developers would think about commenting out a few lines of their >>>> code, seeing what happens? Maybe a comment to the effect of "Rmerge is now >>>> deprecated; use Rmeas" would be useful as well. Would something >>>> catastrophic happen? >>>> >>>> All the best, >>>> >>>> Jacob Keller >>>> >>>> ******************************************* >>>> Jacob Pearson Keller, PhD >>>> Research Scientist >>>> HHMI Janelia Research Campus / Looger lab >>>> Phone: (571)209-4000 x3159 >>>> Email: >>>> kell...@janelia.hhmi.org<mailto:kell...@janelia.hhmi.org><mailto:kell...@janelia.hhmi.org> >>>> ******************************************* >>>> >>>> >>>> -- >>>> This e-mail and any attachments may contain confidential, copyright and or >>>> privileged material, and are for the use of the intended addressee only. >>>> If you are not the intended addressee or an authorised recipient of the >>>> addressee please notify us of receipt by returning the e-mail and do not >>>> use, copy, retain, distribute or disclose the information in or attached >>>> to the e-mail. >>>> Any opinions expressed within this e-mail are those of the individual and >>>> not necessarily of Diamond Light Source Ltd. >>>> Diamond Light Source Ltd. cannot guarantee that this e-mail or any >>>> attachments are free from viruses and we cannot accept liability for any >>>> damage which you may sustain as a result of software viruses which may be >>>> transmitted in or with the message. >>>> Diamond Light Source Limited (company no. 4375679). Registered in England >>>> and Wales with its registered office at Diamond House, Harwell Science and >>>> Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom >>>> >>>> >>>> >>>> >>>> >>>> >>>