> The ideas was to cut all datasets at say 30% CC1/2 to see how they differ in > resolution I/sigI etc. for that given CC1/2 …
not sure which insight that would give you. CC1/2 and mean I/sigI of the merged data are related quantities; that relation is given in (1). The formula given in "Box 1" of that paper shows that a CC1/2 of 20% corresponds to an average I/sigI of the merged data around 1, and 30% corresponds to about 1.3 . The advantage of CC1/2 over mean I/sigI is that the sigmas are not required. Sigmas are difficult to get right, or even consistent, and different programs result in different sigmas for the same data. Furthermore, correlation coefficients have known statistical properties, e.g. their "significance" (the probability of a given value, or higher, arising by chance) can be calculated. If that "significance" has a low numerical value (e.g. 0.1%) then you may conclude that this value is due to signal in your data. In this example, only in (statistically) 1 out of 1000 cases you would _wrongly_ conclude that there is signal. Whether a correlation coefficient is significant at a given "significance level" (e.g. 0.1% which is the value that results in a "*" appended to the numerical value in CORRECT.LP and XSCALE.LP) depends on its numerical value, and the number of unique reflections it is based upon. There is thus no fixed cutoff. BTW no such insight is available for the mean I/sigI of the merged data. People using a cutoff of 2 (or 3, or 1) for the mean I/sigI are just using an arbitrary number, as if it were magic. As long as it is "significant", the same goes for a CC1/2 cutoff of 20% or 30% or ... it is arbitrary. CC1/2 = 14.3% is the value where the correlation of the merged intensities with the (unknown) true intensities can be expected to be 50% - this is just to put the numbers into perspective, and is not to be used as a cutoff. For refinement, there is no "best" cutoff that always works. It depends on the accuracy of the model whether it can extract information from the weak intensities in the high-resolution data. There is a useful test called "paired refinement" that helps finding out if the weak data really improve the model, or not. It is rather simple to apply that test (PDB_REDO does it in an automated way) but its outcome depends on both the accuracy of the data, and the accuracy of the model. It is safe to err on the side of "too optimistic" high-resolution cutoff because there is no degradation of the model when using those data. But to cut "too low" may mean missing the opportunity to get a better model. One insight (Garib Murshudov) is that if the R/Rfree of your model in the high-resolution shell is >42% (assuming no twinning or tNCS) then that matches what would be obtained by refinement of the correct model against constant intensities (as derived from the Wilson plot) - an indication that one should rather not use the data beyond this resolution for refinement, or that the model has significant errors. Hope this helps, Kay (1) Karplus, P.A., Diederichs, K. (2015) Assessing and maximizing data quality in macromolecular crystallography. Curr. Opin. Struct. Biol. 34, 60-68; online at https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4684713