Dear Ian,
I totally agree with your observations and recommendations. If one is
concerned about "instability" of the optimizer (minimization
and/or simulated annealing) I suggest to also monitor the value
of the total energy function (X-ray maximum likelihood term
plus all restraints).
Another source for slight variations in R values can occur after
recalculation of the bulk solvent mask and model parameters
if the model has significantly moved between solvent mask updates.
Axel
On Feb 16, 2009, at 6:21 AM, Ian Tickle wrote:
Dear George
I would still maintain that values of Rfree where the refinement had
not
attained convergence are totally uninformative, so I would say you
made
the right call! During a refinement run, Rfree is often observed to
fall initially and then increase towards the end, though usually not
significantly. One cannot deduce anything from this behaviour, and
indeed it is not at all surprising: since Rfree is not the target
function of the optimisation (or even correlated with it) there's no
reason why it should do anything in particular. Exactly the same
applies to Rwork: because it's a completely different function from
the
target function (it contains no weighting information for one thing),
there's absolutely no reason why Rwork should be a minimum at
convergence (even in the case of unrestrained refinement, and even
though it surely is correlated with the target function). If that
were
true we would be able to use Rwork as the target function!
The test for overfitting can only be done if you have at least 2
refinement runs done with different protocols (e.g. no of waters
added)
to compare: the one with the higher Rfree (or lower free likelihood)
at
convergence is overfitted. Note that this is a relative test: you can
never be sure that a particular model is not overfitted. It's always
possible for someone to come along in the future using a different
parameter set (or different weighting) and produce a lower Rfree than
you did (using the same data of course), making your model overfitted
after the fact!
Cheers
-- Ian
-----Original Message-----
From: George M. Sheldrick [mailto:gshe...@shelx.uni-ac.gwdg.de]
Sent: 16 February 2009 11:24
To: Ian Tickle
Cc: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] unstable refinement
Dear Ian,
That was in fact one of my reasons for only calculating the free R
at the end of a SHELXL refinement run (the other reason, now less
important, was to save some CPU time). I have to add that I am no
longer completely convinced that I made the right decision all
those years ago. A stable refinement in which R decreases but
Rfree goes through a minimum and then starts to rise might be a
useful indication of overfitting?!
Best wishes, George
Prof. George M. Sheldrick FRS
Dept. Structural Chemistry,
University of Goettingen,
Tammannstr. 4,
D37077 Goettingen, Germany
Tel. +49-551-39-3021 or -3068
Fax. +49-551-39-22582
On Mon, 16 Feb 2009, Ian Tickle wrote:
Clemens, I know we've had this discussion several times before, but
I'd
like to take you up on the point you made that reducing Rfree-R is
necessarily always a 'good thing'. Suppose the refinement had
started
from a point where Rfree was biased, e.g. the test set in use had
previously been part of the working set, so that Rfree-R was too
small.
In that case one would hope and indeed expect that Rfree-R would
increase on further refinement now excluding the test set.
Shouldn't
the criterion be that Rfree-R should attain its expected value
(dependent of course on the observation/parameter ratio and the
weighting parameters), so a high value of |(Rfree-R) - <Rfree-R>| is
bad, i.e. any significant deviations of (Rfree-R) from its
expectation
are bad?
I would go further than that and say that anyway Rfree is
meaningless
unless the refinement has converged, i.e. reached its maximum (local
or
global) total likelihood (i.e. data+restraints). So one simply
cannot
compare the Rfree (or Rfree-R) values at the beginning and end of a
run.
The purpose of Rfree (or better free likelihood) is surely to
compare
the *results* of *different* runs where convergence has been
attained
and where the *refinement protocol* (i.e. selection of parameters to
vary and weighting parameters) has been varied, and then to choose
as
the optimal protocol (and therefore optimal result) the one that
gave
the lowest Rfree (or highest free likelihood).
Rfree-R is then used as a subsidiary test to verify that it has
attained
its expected value, if not then something is wrong, i.e. either the
refinement didn't converge (Rfree-R lower than <Rfree-R>) or there
are
non-random errors (Rfree-R higher than <Rfree-R>), or a combination
of
factors.
Cheers
-- Ian
-----Original Message-----
From: owner-ccp...@jiscmail.ac.uk
[mailto:owner-ccp...@jiscmail.ac.uk]
On
Behalf Of Clemens Vonrhein
Sent: 13 February 2009 17:15
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] unstable refinement
* you don't mention if the R and Rfree move up identically - or if
you
have a faster increase in R than in Rfree, which would mean that
your R-factors are increasing (bad I guess) but your Rfree-R gap
is
closing down (good).
So moving from R/Rfree=0.20/0.35 to R/Rfree=0.32/37 is different
than moving from R/Rfree=0.20/0.25 to R/Rfree=0.23/0.28.
Disclaimer
This communication is confidential and may contain privileged
information intended solely for the named addressee(s). It may not be
used
or disclosed except for the purpose for which it has been sent. If
you
are
not the intended recipient you must not review, use, disclose, copy,
distribute or take any action in reliance upon it. If you have
received
this communication in error, please notify Astex Therapeutics Ltd by
emailing i.tic...@astex-therapeutics.com and destroy all copies of
the
message and any attached documents.
Astex Therapeutics Ltd monitors, controls and protects all its
messaging
traffic in compliance with its corporate email policy. The Company
accepts
no liability or responsibility for any onward transmission or use of
emails and attachments having left the Astex Therapeutics domain.
Unless
expressly stated, opinions in this message are those of the
individual
sender and not of Astex Therapeutics Ltd. The recipient should check
this
email and any attachments for the presence of computer viruses. Astex
Therapeutics Ltd accepts no liability for damage caused by any virus
transmitted by this email. E-mail is susceptible to data corruption,
interception, unauthorized amendment, and tampering, Astex
Therapeutics
Ltd only send and receive e-mails on the basis that the Company is
not
liable for any such alteration or any consequences thereof.
Astex Therapeutics Ltd., Registered in England at 436 Cambridge
Science
Park, Cambridge CB4 0QA under number 3751674
Disclaimer
This communication is confidential and may contain privileged
information intended solely for the named addressee(s). It may not
be used or disclosed except for the purpose for which it has been
sent. If you are not the intended recipient you must not review,
use, disclose, copy, distribute or take any action in reliance upon
it. If you have received this communication in error, please notify
Astex Therapeutics Ltd by emailing i.tic...@astex-therapeutics.com
and destroy all copies of the message and any attached documents.
Astex Therapeutics Ltd monitors, controls and protects all its
messaging traffic in compliance with its corporate email policy. The
Company accepts no liability or responsibility for any onward
transmission or use of emails and attachments having left the Astex
Therapeutics domain. Unless expressly stated, opinions in this
message are those of the individual sender and not of Astex
Therapeutics Ltd. The recipient should check this email and any
attachments for the presence of computer viruses. Astex Therapeutics
Ltd accepts no liability for damage caused by any virus transmitted
by this email. E-mail is susceptible to data corruption,
interception, unauthorized amendment, and tampering, Astex
Therapeutics Ltd only send and receive e-mails on the basis that the
Company is not liable for any such alteration or any consequences
thereof.
Astex Therapeutics Ltd., Registered in England at 436 Cambridge
Science Park, Cambridge CB4 0QA under number 3751674
Axel T. Brunger
Investigator, Howard Hughes Medical Institute
Professor of Molecular and Cellular Physiology
Stanford University
Web: http://atbweb.stanford.edu
Email: brun...@stanford.edu
Phone: +1 650-736-1031
Fax: +1 650-745-1463