I thought that as author of Scala I might put in my 2 penn'th to this discussion, FWIW

1. I've never been able to find any useful distinction between Rsym & Rmerge, and when filling in the PDBs request for both (undefined by them and irritatingly restricted to < 0.99, at least in Autodep) I put the same number in both places. I endorse Manfred's point that the multiplicity-weighed version Rmeas aka Rrim is a better measure than Rmerge

As Eleanor, pointed out, the definitions used in Scala are in a CCP4 study weekend article
Acta Cryst./ (2006). D*62*, 72-82 [ doi:10.1107/S0907444905036693 ]

but note that the printers managed to lose a Sqrt in the definition of Rmeas and Rpim, in the terms (n/n-1) or (1/n-1)

Writing the j'th observation of reflection h as Ihj

Rmerge = Sum(h) [Sum(j) [I(hj) - <Ih>] / Sum(hj) <Ih>

Rmeas = Sum(h) [ Sqrt(n/(n-1)) Sum(j) [I(hj) - <Ih>] / Sum(hj) <Ih>

Rpim = Sum(h) [ Sqrt(1/(n-1)) Sum(j) [I(hj) - <Ih>] / Sum(hj) <Ih>

where n is the number of observation of reflection h (ie j=1,n)

2. The <I/sigma> definition used in Scala, labelled "Mn(I/sd)" in the table) is calculated as follows:-

(a) apply "correction" Sdfac, SdB, Sdfac to the individual estimated sigma(Ihj) to get sigma'(Ihj) (b) get weighted mean for reflection h <Ih> and its estimated sd sigma(<Ih>) from sigma'(Ihj) (c) average [<ih>/sigma(<Ih>)] in resolution shells, < <Ih>/ sigma(<Ih> > == Mn(I/sd)

Note that this is only really useful if the estimated sigmas are valid estimates of the true error, and this is very difficult to do properly. The latest pre-release versions of Scala do have a more automated estimation of the SD "correction" (or fudge...) factors, which I'm still trying to improve, but it is important to realise that all these statistics, including Rfactors, measure internal consistency rather than absolute accuracy.


The more general point is why do we want to look at these statistics? What is the question?

(i) I have several datasets from different crystals: which is the "best"? Judged on Rfactor, I/sigma, completeness (multiplicity improves Rmeas & I/sigma).

(ii) Should this dataset be thrown away? Not if it's the best you have

(iii) What is the "resolution" of the dataset? Where should we cut it? This is a difficult question - it depends on what you are going to use it for. It is affected by anisotropy (which is treated badly or not at all by most current programs). If I add another shell of data, is it adding any useful information? However, in most cases it isn't critical except for referees (1.99Å resolution, anyone?)

I could go on, but back to work trying to improve the programs ...

Phil


On 18 Jan 2008, at 20:10, Edwin Pozharski wrote:

Chris Putnam wrote:
I won't belabor this point (or defend this view) any further, though I will repeat my surprise at the lack of a clear
consensus for what Rsym and Rmerge actually mean,
as opposed to things like I/sigma, for example.

I/sigma is also open to interpretation. Is it <I>/<sigma> or <I/ sigma> (averaged over all the reflection in a given resolution shell)?

Reply via email to