Dear George I would still maintain that values of Rfree where the refinement had not attained convergence are totally uninformative, so I would say you made the right call! During a refinement run, Rfree is often observed to fall initially and then increase towards the end, though usually not significantly. One cannot deduce anything from this behaviour, and indeed it is not at all surprising: since Rfree is not the target function of the optimisation (or even correlated with it) there's no reason why it should do anything in particular. Exactly the same applies to Rwork: because it's a completely different function from the target function (it contains no weighting information for one thing), there's absolutely no reason why Rwork should be a minimum at convergence (even in the case of unrestrained refinement, and even though it surely is correlated with the target function). If that were true we would be able to use Rwork as the target function!
The test for overfitting can only be done if you have at least 2 refinement runs done with different protocols (e.g. no of waters added) to compare: the one with the higher Rfree (or lower free likelihood) at convergence is overfitted. Note that this is a relative test: you can never be sure that a particular model is not overfitted. It's always possible for someone to come along in the future using a different parameter set (or different weighting) and produce a lower Rfree than you did (using the same data of course), making your model overfitted after the fact! Cheers -- Ian > -----Original Message----- > From: George M. Sheldrick [mailto:gshe...@shelx.uni-ac.gwdg.de] > Sent: 16 February 2009 11:24 > To: Ian Tickle > Cc: CCP4BB@JISCMAIL.AC.UK > Subject: Re: [ccp4bb] unstable refinement > > > Dear Ian, > > That was in fact one of my reasons for only calculating the free R > at the end of a SHELXL refinement run (the other reason, now less > important, was to save some CPU time). I have to add that I am no > longer completely convinced that I made the right decision all > those years ago. A stable refinement in which R decreases but > Rfree goes through a minimum and then starts to rise might be a > useful indication of overfitting?! > > Best wishes, George > > Prof. George M. Sheldrick FRS > Dept. Structural Chemistry, > University of Goettingen, > Tammannstr. 4, > D37077 Goettingen, Germany > Tel. +49-551-39-3021 or -3068 > Fax. +49-551-39-22582 > > > On Mon, 16 Feb 2009, Ian Tickle wrote: > > > Clemens, I know we've had this discussion several times before, but I'd > > like to take you up on the point you made that reducing Rfree-R is > > necessarily always a 'good thing'. Suppose the refinement had started > > from a point where Rfree was biased, e.g. the test set in use had > > previously been part of the working set, so that Rfree-R was too small. > > In that case one would hope and indeed expect that Rfree-R would > > increase on further refinement now excluding the test set. Shouldn't > > the criterion be that Rfree-R should attain its expected value > > (dependent of course on the observation/parameter ratio and the > > weighting parameters), so a high value of |(Rfree-R) - <Rfree-R>| is > > bad, i.e. any significant deviations of (Rfree-R) from its expectation > > are bad? > > > > I would go further than that and say that anyway Rfree is meaningless > > unless the refinement has converged, i.e. reached its maximum (local or > > global) total likelihood (i.e. data+restraints). So one simply cannot > > compare the Rfree (or Rfree-R) values at the beginning and end of a run. > > The purpose of Rfree (or better free likelihood) is surely to compare > > the *results* of *different* runs where convergence has been attained > > and where the *refinement protocol* (i.e. selection of parameters to > > vary and weighting parameters) has been varied, and then to choose as > > the optimal protocol (and therefore optimal result) the one that gave > > the lowest Rfree (or highest free likelihood). > > > > Rfree-R is then used as a subsidiary test to verify that it has attained > > its expected value, if not then something is wrong, i.e. either the > > refinement didn't converge (Rfree-R lower than <Rfree-R>) or there are > > non-random errors (Rfree-R higher than <Rfree-R>), or a combination of > > factors. > > > > Cheers > > > > -- Ian > > > > > -----Original Message----- > > > From: owner-ccp...@jiscmail.ac.uk [mailto:owner-ccp...@jiscmail.ac.uk] > > On > > > Behalf Of Clemens Vonrhein > > > Sent: 13 February 2009 17:15 > > > To: CCP4BB@JISCMAIL.AC.UK > > > Subject: Re: [ccp4bb] unstable refinement > > > > > > * you don't mention if the R and Rfree move up identically - or if you > > > have a faster increase in R than in Rfree, which would mean that > > > your R-factors are increasing (bad I guess) but your Rfree-R gap is > > > closing down (good). > > > > > > So moving from R/Rfree=0.20/0.35 to R/Rfree=0.32/37 is different > > > than moving from R/Rfree=0.20/0.25 to R/Rfree=0.23/0.28. > > > > > > Disclaimer > > This communication is confidential and may contain privileged > information intended solely for the named addressee(s). It may not be used > or disclosed except for the purpose for which it has been sent. If you are > not the intended recipient you must not review, use, disclose, copy, > distribute or take any action in reliance upon it. If you have received > this communication in error, please notify Astex Therapeutics Ltd by > emailing i.tic...@astex-therapeutics.com and destroy all copies of the > message and any attached documents. > > Astex Therapeutics Ltd monitors, controls and protects all its messaging > traffic in compliance with its corporate email policy. The Company accepts > no liability or responsibility for any onward transmission or use of > emails and attachments having left the Astex Therapeutics domain. Unless > expressly stated, opinions in this message are those of the individual > sender and not of Astex Therapeutics Ltd. The recipient should check this > email and any attachments for the presence of computer viruses. Astex > Therapeutics Ltd accepts no liability for damage caused by any virus > transmitted by this email. E-mail is susceptible to data corruption, > interception, unauthorized amendment, and tampering, Astex Therapeutics > Ltd only send and receive e-mails on the basis that the Company is not > liable for any such alteration or any consequences thereof. > > Astex Therapeutics Ltd., Registered in England at 436 Cambridge Science > Park, Cambridge CB4 0QA under number 3751674 > > Disclaimer This communication is confidential and may contain privileged information intended solely for the named addressee(s). It may not be used or disclosed except for the purpose for which it has been sent. If you are not the intended recipient you must not review, use, disclose, copy, distribute or take any action in reliance upon it. If you have received this communication in error, please notify Astex Therapeutics Ltd by emailing i.tic...@astex-therapeutics.com and destroy all copies of the message and any attached documents. Astex Therapeutics Ltd monitors, controls and protects all its messaging traffic in compliance with its corporate email policy. The Company accepts no liability or responsibility for any onward transmission or use of emails and attachments having left the Astex Therapeutics domain. Unless expressly stated, opinions in this message are those of the individual sender and not of Astex Therapeutics Ltd. The recipient should check this email and any attachments for the presence of computer viruses. Astex Therapeutics Ltd accepts no liability for damage caused by any virus transmitted by this email. E-mail is susceptible to data corruption, interception, unauthorized amendment, and tampering, Astex Therapeutics Ltd only send and receive e-mails on the basis that the Company is not liable for any such alteration or any consequences thereof. Astex Therapeutics Ltd., Registered in England at 436 Cambridge Science Park, Cambridge CB4 0QA under number 3751674