> The ideas was to cut all datasets at say 30% CC1/2 to see how they differ in 
> resolution I/sigI etc. for that given CC1/2 … 

not sure which insight that would give you. CC1/2 and mean I/sigI of the merged 
data are related quantities; that relation is given in (1). The formula given 
in "Box 1" of that paper shows that a CC1/2 of 20% corresponds to an average 
I/sigI of the merged data around 1, and 30% corresponds to about 1.3 . 

The advantage of CC1/2 over mean I/sigI is that the sigmas are not required. 
Sigmas are difficult to get right, or even consistent, and different programs 
result in different sigmas for the same data.

Furthermore, correlation coefficients have known statistical properties, e.g. 
their "significance" (the probability of a given value, or higher, arising by 
chance) can be calculated. If that "significance" has a low numerical value 
(e.g. 0.1%) then you may conclude that this value is due to signal in your 
data. In this example, only in (statistically) 1 out of 1000 cases you would 
_wrongly_ conclude that there is signal.

Whether a correlation coefficient is significant at a given "significance 
level" (e.g. 0.1% which is the value that results in a "*" appended to the 
numerical value in CORRECT.LP and XSCALE.LP) depends on its numerical value, 
and the number of unique reflections it is based upon. There is thus no fixed 
cutoff. BTW no such insight is available for the mean I/sigI of the merged 
data. 

People using a cutoff of 2 (or 3, or 1) for the mean I/sigI are just using an 
arbitrary number, as if it were magic. As long as it is "significant", the same 
goes for a CC1/2 cutoff of 20% or 30% or ... it is arbitrary. CC1/2 = 14.3% is 
the value where the correlation of the merged intensities with the (unknown) 
true intensities can be expected to be 50% - this is just to put the numbers 
into perspective, and is not to be used as a cutoff. 

For refinement, there is no "best" cutoff that always works. It depends on the 
accuracy of the model whether it can extract information from the weak 
intensities in the high-resolution data. There is a useful test called "paired 
refinement" that helps finding out if the weak data really improve the model, 
or not. It is rather simple to apply that test (PDB_REDO does it in an 
automated way) but its outcome depends on both the accuracy of the data, and 
the accuracy of the model. 

It is safe to err on the side of "too optimistic" high-resolution cutoff 
because there is no degradation of the model when using those data. But to cut 
"too low" may mean missing the opportunity to get a better model. 

One insight (Garib Murshudov) is that if the R/Rfree of your model in the 
high-resolution shell is >42% (assuming no twinning or tNCS) then that matches 
what would be obtained by refinement of the correct model against constant 
intensities (as derived from the Wilson plot) - an indication that one should 
rather not use the data beyond this resolution for refinement, or that the 
model has significant errors.

Hope this helps,

Kay

(1) Karplus, P.A., Diederichs, K. (2015) Assessing and maximizing data quality 
in macromolecular crystallography. Curr. Opin. Struct. Biol. 34, 60-68; online 
at https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4684713

Reply via email to