Using a percentage might be justified as a trade-off between having ample free reflections for statistics and cutting too deeply into your completeness of working reflections for refinement.
It seems to be generally agreed that 2000 free reflections is sufficient to guide your choice of refinement strategy. You don't _need_ more. I think it is also agreed that 95% completeness of data is sufficient to solve a structure with high quality. You don't _need_ more to solve it. But now if you want to look at R and R-free in 40 resolution bins - that's only 50 free reflections in each bin - you are likely to have unexpected things like Rfree < Rwork in some bins, just by statistics. More would be better.But if we have a lot of residues just on the Ramachandran border between allowed and generously allowed, we might suspect that more working reflections would help to pull them inside. There is a trade-off. If your entire dataset is only 20,000 reflections, you are already giving up 10% of your data for refinement, you make do with 2000 or even 1000 free reflections and use fewer bins. If you have 400,000 reflections you wouldn't think twice about using 4, or 5,000 reflections to get better statistics in the bins. You probably wouldn't want to use 5% in that case, but if you did it might not greatly hurt your structure. So i guess we need some guideline, as we get more and more reflections, what is the best way to allocate between free and working? probably something between a fixed absolute minimum for number of free reflections and a constant percentage of reflections; perhaps with a cap above which it really wouldn't help to have more free reflections than that. I believe one of the non-CCP4 crystallography packages does have a cap, using the lesser of a fixed percentage and a fixed constant number. Yes, the default in a recent .eff file is: fraction = 0.1 max_free = 2000 Of course, having a huge number of reflections doesn't necessarily mean an over-determined structure, we could be talking about a ribosome at 4A or something, tough building, and you could be loathe to give up more data than you have to.. eab On 11/20/2014 05:43 PM, Keller, Jacob wrote:
Dear Crystallographers, I thought that for reliable values for Rfree, one needs only to satisfy counting statistics, and therefore using at most a couple thousand reflections should always be sufficient. Almost always, however, some seemingly-arbitrary percentage of reflections is used, say 5%. Is there any rationale for using a percentage rather than some absolute number like 1000? All the best, Jacob ******************************************* Jacob Pearson Keller, PhD Looger Lab/HHMI Janelia Research Campus 19700 Helix Dr, Ashburn, VA 20147 email: kell...@janelia.hhmi.org *******************************************