Re: [ccp4bb] Rfree in similar data set

Dirk Kostrewa Thu, 24 Sep 2009 05:21:13 -0700

Hi Ian,

consider the case where two data sets have been collected from thesame crystal (or a crystal from the same drop), each processedseparately, and the structure refined against one of the two data setsuntil convergence. The two data sets will be somewhat different due tomeasurement errors but still very similar. Thus, when I take therefined structure and re-refine it against the second data set usingthe same indices for working and test set (and the same refinementparameters), both the starting R and Rfree will not have convergedagainst the second data set, but will be similar to the refined valuesfrom the first data set. The differences will be mainly caused by themeasurement errors. It is this type of bias of the test set that (atleast) I mean. After convergence of refinement against the second dataset, both R and Rfree will be then very similar for the two data sets.


Best regards,

Dirk.

Am 24.09.2009 um 11:56 schrieb Ian Tickle:

Hi, I beg to disagree with the 'perceived wisdom', including justabout
everyone on this BB, but my answer is NO, there should be no bias -
*provided* you do the subsequent refinement properly. First off,Rfree
is useless as any kind of statistical measure of overfitting etc
*unless* the refinement has converged to the point of maximum log
likelihood against the current working set. So it's meaningless tosaythat Rfree is biased 'initially' i.e. *before* any furtherrefinement is
done using the new data because Rfree with the new data has no meaning
at that point - it's neither biased nor unbiased, it's justmeaningless!In any case why would one want to report an Rfree *before*refinement -
what use is it?
So we can only sensibly talk about the Rfree values *after* thefurther
refinement has converged - and if the refinement hasn't converged then
Rfree bias is the least of your worries!  So are people really saying
that the Rfree at convergence using the new data is biased? Forthat to
be true it would have to be possible to arrive at a different unbiased
Rfree from another starting point.  But provided your starting point
wasn't a local maximum LL and you haven't gotten into a local maximum
along the way, convergence will be to a unique global maximum of theLL,
so the Rfree must be the same whatever starting point is used (within
the radius of convergence of course).
The other cures suggested such as SA and randomisation are IMO atbest a
waste of time and effort (i.e. it will take longer for subsequent
refinement to recover from the shock to the system), and at worstlikely
to be worse than the disease they purport to cure.  For example how do
you know what RMS shift to use in the randomisation without causingthe
structure to jump into a local maximum LL: the resulting Rfree will
certainly be biased then!

There is of course a different issue (and maybe this is what is
confusing some people) of comparing Rfree's from different testsets: we
showed that this introduces a random relative error in Rfree of
1/sqrt(2*Nfree) (where Nfree = size of test set).  However this effect
is not bias, it's random sampling error.

Cheers

-- Ian
-----Original Message-----
From: owner-ccp...@jiscmail.ac.uk [mailto:owner-ccp...@jiscmail.ac.uk]
On
Behalf Of Mike England
Sent: 24 September 2009 04:31
To: CCP4BB@JISCMAIL.AC.UK
Subject: Rfree in similar data set

Hi all,

I will appreciate your comments on the following case:

I have two datasets from the same or identical crystals. Initially, I
refine a structure against the first data set  and later on switch to
another dataset  for further refinements.
Do you think, my Rfree will be biased as Rfree reflections in second
dataset may be in fact Rwork reflections in previous datasets ?

Thanks in advance,

Mike
Disclaimer
This communication is confidential and may contain privilegedinformation intended solely for the named addressee(s). It may notbe used or disclosed except for the purpose for which it has beensent. If you are not the intended recipient you must not review,use, disclose, copy, distribute or take any action in reliance uponit. If you have received this communication in error, please notifyAstex Therapeutics Ltd by emailing i.tic...@astex-therapeutics.comand destroy all copies of the message and any attached documents.Astex Therapeutics Ltd monitors, controls and protects all itsmessaging traffic in compliance with its corporate email policy. TheCompany accepts no liability or responsibility for any onwardtransmission or use of emails and attachments having left the AstexTherapeutics domain. Unless expressly stated, opinions in thismessage are those of the individual sender and not of AstexTherapeutics Ltd. The recipient should check this email and anyattachments for the presence of computer viruses. Astex TherapeuticsLtd accepts no liability for damage caused by any virus transmittedby this email. E-mail is susceptible to data corruption,interception, unauthorized amendment, and tampering, AstexTherapeutics Ltd only send and receive e-mails on the basis that theCompany is not liable for any such alteration or any consequencesthereof.Astex Therapeutics Ltd., Registered in England at 436 CambridgeScience Park, Cambridge CB4 0QA under number 3751674



*******************************************************
Dirk Kostrewa
Gene Center, A 5.07
Ludwig-Maximilians-University
Feodor-Lynen-Str. 25
81377 Munich
Germany
Phone:  +49-89-2180-76845
Fax:    +49-89-2180-76999
E-mail: kostr...@genzentrum.lmu.de
WWW:    www.genzentrum.lmu.de
*******************************************************

Re: [ccp4bb] Rfree in similar data set

Reply via email to