I thought that as author of Scala I might put in my 2 penn'th to this
discussion, FWIW
1. I've never been able to find any useful distinction between Rsym &
Rmerge, and when filling in the PDBs request for both (undefined by
them and irritatingly restricted to < 0.99, at least in Autodep) I put
the same number in both places. I endorse Manfred's point that the
multiplicity-weighed version Rmeas aka Rrim is a better measure than
Rmerge
As Eleanor, pointed out, the definitions used in Scala are in a CCP4
study weekend article
Acta Cryst./ (2006). D*62*, 72-82 [ doi:10.1107/S0907444905036693 ]
but note that the printers managed to lose a Sqrt in the definition of
Rmeas and Rpim, in the terms (n/n-1) or (1/n-1)
Writing the j'th observation of reflection h as Ihj
Rmerge = Sum(h) [Sum(j) [I(hj) - <Ih>] / Sum(hj) <Ih>
Rmeas = Sum(h) [ Sqrt(n/(n-1)) Sum(j) [I(hj) - <Ih>] / Sum(hj) <Ih>
Rpim = Sum(h) [ Sqrt(1/(n-1)) Sum(j) [I(hj) - <Ih>] / Sum(hj) <Ih>
where n is the number of observation of reflection h (ie j=1,n)
2. The <I/sigma> definition used in Scala, labelled "Mn(I/sd)" in the
table) is calculated as follows:-
(a) apply "correction" Sdfac, SdB, Sdfac to the individual
estimated sigma(Ihj) to get sigma'(Ihj)
(b) get weighted mean for reflection h <Ih> and its estimated sd
sigma(<Ih>) from sigma'(Ihj)
(c) average [<ih>/sigma(<Ih>)] in resolution shells, < <Ih>/
sigma(<Ih> > == Mn(I/sd)
Note that this is only really useful if the estimated sigmas are valid
estimates of the true error, and this is very difficult to do
properly. The latest pre-release versions of Scala do have a more
automated estimation of the SD "correction" (or fudge...) factors,
which I'm still trying to improve, but it is important to realise that
all these statistics, including Rfactors, measure internal consistency
rather than absolute accuracy.
The more general point is why do we want to look at these statistics?
What is the question?
(i) I have several datasets from different crystals: which is the
"best"? Judged on Rfactor, I/sigma, completeness (multiplicity
improves Rmeas & I/sigma).
(ii) Should this dataset be thrown away? Not if it's the best you have
(iii) What is the "resolution" of the dataset? Where should we cut it?
This is a difficult question - it depends on what you are going to use
it for. It is affected by anisotropy (which is treated badly or not at
all by most current programs). If I add another shell of data, is it
adding any useful information? However, in most cases it isn't
critical except for referees (1.99Å resolution, anyone?)
I could go on, but back to work trying to improve the programs ...
Phil
On 18 Jan 2008, at 20:10, Edwin Pozharski wrote:
Chris Putnam wrote:
I won't belabor this point (or defend this view) any further,
though I will repeat my surprise at the lack of a clear
consensus for what Rsym and Rmerge actually mean,
as opposed to things like I/sigma, for example.
I/sigma is also open to interpretation. Is it <I>/<sigma> or <I/
sigma> (averaged over all the reflection in a given resolution shell)?