Eleanor But is the R factor a good way to assess this? - in fact who cares if R looks worse, the goal of structure refinement after all is certainly not to get a better R factor! The R factor if it's anything is a measure of comparative model quality, not comparative data quality. What I mean is that whereas it's valid to use it (and other statistics such as likelihood) to compare models while keeping the same data, it's not valid to use it compare different subsets of the data, whether the model is kept fixed or not. Likelihood is a function of the model with fixed data, i.e. L(model | data) not L(data | model) - that's the probability.
The problem with the standard R is that it's unweighted so that weak data will have a disproportionately big effect on it, much more so in fact that in ML refinement where as you say data with low signal/noise will be severely weighted down and will have minimal effect on the refined structure and the maps. The weighted (aka Hamilton) R would surely be a better way to assess the optimal resolution cut-off. If the outer shell average (I / sigma(I)) is say 2 then the Wilson distribution implies that in order to get that average there must be present in the shell a significant proportion of useful data with (I / sigma(I)) > 3. As several people have already pointed out the important thing is surely not to throw away useful data! Cheers -- Ian On 8 August 2012 10:10, Eleanor Dodson <eleanor.dod...@york.ac.uk> wrote: > Like Ian, I tend to use as much data as is reasonable - but it is useful to > look at the Rfactors plot again resolution in REFMAC output. If this shoots > sky high at the limit, the data is probably not very useful in refinement or > map calculation (and will automatically be down-weghted by the ML weighting) > . So all it does is make your Rfactors look worse! > Eleanor > On 6 Aug 2012, at 12:21, Marcus Fislage wrote: > >> Dear all, >> >> We have in our lab a data set collected and are discussing where to cut >> the resolution for refinement. According to the work of Kai Diederichs >> and Andy Karplus one should use CC 1/2 of 12.5% (in case it is >> significant) to determine the highest resolution independent of the >> I/sigI and R factor rules used earlier. But I would like to know if this >> also counts for low completeness data? >> The problem is that we have in the highest resolution shell an I/sigI of >> 4, a good cc1/2 but only a completeness of 30%. Which I guess means we >> measured the high resolution data very accurate but not complete. Would >> you still use the low complete data in the highest resolution shell or >> should that be still a valid argument to cut your data towards lower >> resolution? >> My guess would be to use the data still even if the completeness drops, >> since the data we measured is good and according to CC1/2 significant. >> Are we right to do so or would you disagree? >> >> Thanks for any input >> Marcus >> >> -- >> Marcus Fislage >> Structural Biology Brussels >> Vrije Universiteit Brussel >> Department of Structural Biology, VIB >> Oefenplein, Gebouw E >> Pleinlaan 2, >> 1050 Brussel >> Belgium >> Tel: +32-2-629 18 51 >> Email : marcus.fisl...@vib-vub.be >> Url: http://www.verseeslab.structuralbiology.be