Using a percentage might be justified as a trade-off between having ample free 
reflections for statistics and cutting too deeply into your completeness of 
working reflections for refinement.

It seems to be generally agreed that 2000 free reflections is sufficient to 
guide your choice of refinement strategy. You don't _need_ more. I think it is 
also agreed that 95% completeness of data is sufficient to solve a structure 
with high quality. You don't _need_ more to solve it.
But now if you want to look at R and R-free in 40 resolution bins - that's only 50 
free reflections in each bin - you are likely to have unexpected things like Rfree 
< Rwork in some bins, just by statistics. More would be better.But if we have a 
lot of residues just on the Ramachandran border between allowed and generously 
allowed, we might suspect that more working reflections would help to pull them 
inside. There is a trade-off.

If your entire dataset is only 20,000 reflections, you are already giving up 
10% of your data for refinement, you make do with 2000 or even 1000 free 
reflections and use fewer bins. If you have 400,000 reflections you wouldn't 
think twice about using 4, or 5,000 reflections to get better statistics in the 
bins. You probably wouldn't want to use 5% in that case, but if you did it 
might not greatly hurt your structure. So i guess we need some guideline, as we 
get more and more reflections, what is the best way to allocate between free 
and working? probably something between a fixed absolute minimum for number of 
free reflections and a constant percentage of reflections; perhaps with a cap 
above which it really wouldn't help to have more free reflections than that. I 
believe one of the non-CCP4 crystallography packages does have a cap, using the 
lesser of a fixed percentage and a fixed constant number. Yes, the default in a 
recent .eff file is:
        fraction = 0.1
        max_free = 2000


Of course, having a huge number of reflections doesn't necessarily mean an 
over-determined structure, we could be talking about a ribosome at 4A or 
something, tough building, and you could be loathe to give up more data than 
you have to..

eab

On 11/20/2014 05:43 PM, Keller, Jacob wrote:
Dear Crystallographers,

I thought that for reliable values for Rfree, one needs only to satisfy 
counting statistics, and therefore using at most a couple thousand reflections 
should always be sufficient. Almost always, however, some seemingly-arbitrary 
percentage of reflections is used, say 5%. Is there any rationale for using a 
percentage rather than some absolute number like 1000?

All the best,

Jacob

*******************************************
Jacob Pearson Keller, PhD
Looger Lab/HHMI Janelia Research Campus
19700 Helix Dr, Ashburn, VA 20147
email: kell...@janelia.hhmi.org
*******************************************

Reply via email to