Dear Ethan, List, > Surely someone must have done this! But I can't recall ever reading > an analysis of such a refinement protocol. > Does anyone know of relevant reports in the literature?
Total statistical cross validation is indeed what we should be doing, but for large structures the computational cost may be significant. In the absence of total statistical cross validation the reported Rfree may be an 'outlier' (with respect to the distribution of the Rfree values that would have been obtained from all disjoined sets). To tackle this, we usually resort to the following ad hoc procedure : At an early stage of the positional refinement, we use a shell script which (a) uses Phil's PDBSET with the NOISE keyword to randomly shift atomic positions, (b) refine the resulting models with each of the different free sets to completion, (c) Calculate the mean of the resulting free R values, (d) Select (once and for all) the free set which is closer to the mean of the Rfree values obtained above. For structures with a small number of reflections, the statistical noise in the 5% sets can be very significant indeed. We have seen differences between Rfree values obtained from different sets reaching up to 4%. Ideally, and instead of PDBSET+REFMAC we should have been using simulated annealing (without positional refinement), but moving continuously between the CNS-XPLOR and CCP4 was too much for my laziness. All the best, Nicholas -- Dr Nicholas M. Glykos, Department of Molecular Biology and Genetics, Democritus University of Thrace, University Campus, Dragana, 68100 Alexandroupolis, Greece, Tel/Fax (office) +302551030620, Ext.77620, Tel (lab) +302551030615, http://utopia.duth.gr/~glykos/