Dear Derek, I suggest you not not use the cross validation at all. With small data sets the refinement with cross validation is very unstable and the choice of the TEST set dependent. We explained why and suggested to use an alternative function, which can use all data in refinement.
Acta Cryst. (2014). D70, 3124-3134 [ doi:10.1107/S1399004714021336 <http://dx.doi.org/10.1107/S1399004714021336> ] Free kick instead of cross-validation in maximum-likelihood refinement of macromolecular crystal structures J. Praznikar <http://scripts.iucr.org/cgi-bin/citedin?search_on=name&author_name=Praznikar%2C%20J%2E> and D. Turk <http://scripts.iucr.org/cgi-bin/citedin?search_on=name&author_name=Turk%2C%20D%2E> Synopsis: The maximum-likelihood free-kick target, which calculates model error estimates from the work set and a randomly displaced model, proved superior in the accuracy and consistency of refinement of crystal structures compared with the maximum-likelihood cross-validation target, which calculates error estimates from the test set and the unperturbed model. Online 22 November 2014 best regards, dusan > On Dec 20, 2014, at 1:05 AM, CCP4BB automatic digest system > <lists...@jiscmail.ac.uk> wrote: > > Date: Fri, 19 Dec 2014 11:18:37 +0000 > From: Derek Logan <derek.lo...@biochemistry.lu.se > <mailto:derek.lo...@biochemistry.lu.se>> > Subject: Cross-validation when test set is miniscule > > Hi everyone, > > Right now we have one of those very difficult Rfree situations where it's > impossible to generate a single meaningful Rfree set. Since we're in a bit of > a hurry with this structure it would be good if someone could point me in the > right direction. We have crystals with 1542 non-H atoms in the asymmetric > unit that diffract to only 3.6 Å in P65, which gives us a whopping 2300 > reflections in total. 5% of this is only about 100 reflections. Luckily the > protein is only a single point mutation of a wild type that has been solved > to much better resolution, so we know what it should look like and I simply > want to investigate the effect of different levels of conservatism in the > refinement, e.g. NCS in xyz and B, group B-factors, reference model, > Ramachandran restraints etc. However since the quality criterion for this is > Rfree I'm not able to do this. > > I believe the correct approach is k-fold statistical cross-validation, but > can someone remind me of the correct way to do this? I've done a bit of > Googling without finding anything very helpful. > > Thanks > Derek > ________________________________________________________________________ > Derek Logan tel: +46 46 222 1443 > Associate Professor mob: +46 76 8585 707 > Dept. of Biochemistry and Structural Biology www.cmps.lu.se > <http://www.cmps.lu.se/><http://www.cmps.lu.se <http://www.cmps.lu.se/>> > Centre for Molecular Protein Science www.maxlab.lu.se/crystal > <http://www.maxlab.lu.se/crystal> > Lund University, Box 124, 221 00 Lund, Sweden www.saromics.com > <http://www.saromics.com/> Dr. Dusan Turk, Prof. Head of Structural Biology Group http://bio.ijs.si/sbl/ <http://bio.ijs.si/sbl/> Head of Centre for Protein and Structure Production Centre of excellence for Integrated Approaches in Chemistry and Biology of Proteins, Scientific Director http://www.cipkebip.org/ Professor of Structural Biology at IPS "Jozef Stefan" e-mail: dusan.t...@ijs.si phone: +386 1 477 3857 Dept. of Biochem.& Mol.& Struct. Biol. fax: +386 1 477 3984 Jozef Stefan Institute Jamova 39, 1 000 Ljubljana,Slovenia Skype: dusan.turk (voice over internet: www.skype.com <http://www.skype.com/>