I would only like to iterate a small comment I posted before:
Should the cell parameters be inaccurate, optimization of weights by cross-validation (getting the best Rfree) will result in 'higher' RMSD. It is easy to think about it: if in a cell is measured to be 1% larger than in reality, all bonds would 'prefer' to be 1% larger than the 'correct' dictionary values, resulting in a higher RMSD to satisfy that and that structure would have the lowest Rfree because the X-ray data
would be fitted better.I actually think that inaccurate cells are a big source of misery in many refinements. I have found the idea of WhatCheck to actually check your cell by looking at the projection of bond lengths of certain types along the cell axes most useful. I would hardly advocate to measure your cell that way, but going back to you data and looking at the cell again would be worth it.
To make it more fun, cells change during radiation damage, so ... best regards, Tassos On 9 Jan 2008, at 20:15, Ian Tickle wrote:
Hi William & others,Indeed, phenix.refine uses cross-validation to optimise the scaling of the X-ray & B-factor weights. All I did was demonstrate that you can do essentially the same thing as phenix.refine but using Refmac instead. I don't claim to have done anything new, except I modified Refmac to print out the free likelihood and used that as a target function instead of Rfree, as suggested by Gerard Bricogne in Meth. Enzymol. (1997) 276, 361-423. Whatever value of the RMSD (or better the RMS Z-score) comes out of that, you can be sure that it's based purely objectively on the experimental data, not on completely arbitrary and unjustifiable subjective choices, which is what Jaskolski et al. appear to be suggesting. Cross-validation is a well-established methodology in statistics, it's certainly not 'numerology'!Of course then you have to come up with some theory to explain the experimental results, i.e. why the RMSD that comes out must always be <= the RMS standard uncertainty, but actually that's not difficult since the RMSD is related to the accuracy and the SU is related to the precision, and on the face of it there's no reason why these should be related at all (as Gerard nicely demonstrated with his dartboard analogy in Leeds!). Jaskolski et al.'s theory that always RMSD = <SU> regardless of resolution just doesn't fit the experimental results, and as every good scientist knows, it only takes one ugly fact to destroy a beautiful theory.As you point out, setting a target value of 0.02 Ang or higher for the RMSD bonds and similarly for the angles, unless you have very high resolution data, will inevitably result in take-up of some fraction of the random experimental errors into the refined parameters, in order to inflate the RMSD/RMSZ's to their target values and reduce Rwork at the expense of Rfree - otherwise known as overfitting! It's not recommended practice to deliberately cause random errors (however small) to be added to your co- ordinates! This is obvious if you think about what happens at low resolution: there's no justification for refining individual xyz & B's, so the optimal procedure is to use constrained refinement with the torsion angles as parameters, or restrained refinement with *very* tight restraints (if that's feasible). Whether you use constrained refinement or its restrained equivalent, it will keep the bond lengths & angles fixed at the initial dictionary values so the RMSD's will be identically zero, or very nearly so, throughout the refinement.Someone mentioned 'experienced crystallographers': actually since the distinction between RMSD & SU is purely a question of statistics not of crystallography, any crystallographic experience is unlikely to be relevant!The other question you raised is why Refmac doesn't refine the RMSD's much nearer to zero - this is something I also commented on; also why the Rfree & LLfree plots are so noisy compared with those from CNS & phenix.refine. I think it's to do with rounding errors in the gradient calculation and/or optimisation code. Refmac may be using single precision, whereas phenix.refine may be using double - I'm just guessing, maybe the programmers could comment? This is something I would like to see improved, in order to make cross-validation with Refmac more reliable & useful.Cheers -- Ian-----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of William Scott Sent: 09 January 2008 17:32 To: William Scott Cc: ccp4bb@jiscmail.ac.uk Subject: Re: [ccp4bb] bond lengths, angles, ideality and refinements Sorry, that should have read "because the value is established by social consensus, it is thus NOT guaranteed to be perfectly accurate, ..." In other words, one can imagine some source of systematic error in establishing an ideal bond length. For example, the crystal packing environment of small molecules might tend to distort a bond by a couple hundredths of an Ångstrom. William Scott wrote:Dear Yang Li: Happy New Year to you, too, (ahead of Feb. 7th). You certainly owe us no apology; the reverse may not be true. Your question is an important one, as is what you havewritten below.I'm not certain I have a completely satisfactory answer. The reason is that ideal bond lengths may or may not be"true" in thesense that the value is established by social consensus, and is thus guaranteed to be perfectly accurate, even though it may bequite precise.Because of this, and because of natural deviations fromideality (whichreally only become trustworthy observations at extremelyhigh resolution),a certain amount of "wiggle room" is typically allowed interms of rmsd.The more conservative the refinement, the smaller the rmsdfrom idealitywill be. Some people believe 0.02 Å deviation from ideality isreasonable, based onthe accuracy of the dictionary values of bond lengths andangles; othersconsider that to be "too sloppy" and a way to artificially deflate Rfactors. I seem to have detected a tendency in the literature to aimfor about 0.01Å deviation. The new refinement program phenix.refine,which is supposedto optimize weighting between X-ray terms andstereochemical constraintsautomatically, seems to settle in at quite conservativevalues, such as0.005 Å, whereas with refmac, I can't seem to get thegeometry any moreideal than 0.005 Å even if I try to idealize a structure inthe absence ofX-ray data. So, like you, I am a bit confused, and wouldn't mindhearing more from theexperts. All the best, Bill yang li wrote:Dear All, I am very sorry to involve you into such insignificance discussion, I have reached agreement with Prof Gerard, please stop talking about things beyond science, thanks! I read a book today, which said "A refined modelshould exhibitrms deviations of no more than 0.02A for bond length and 4 for bond angels", I justwonder aboutthe standard of the bond length and the bond angel. I think most of you haveread similarwords! But maybe I didnot express clearly and made some phrasal mistakes. At last, happy new year to you all--though very late! Sincerely! Yang LiDisclaimerThis communication is confidential and may contain privileged information intended solely for the named addressee(s). It may not be used or disclosed except for the purpose for which it has been sent. If you are not the intended recipient you must not review, use, disclose, copy, distribute or take any action in reliance upon it. If you have received this communication in error, please notify Astex Therapeutics Ltd by emailing [EMAIL PROTECTED] and destroy all copies of the message and any attached documents. Astex Therapeutics Ltd monitors, controls and protects all its messaging traffic in compliance with its corporate email policy. The Company accepts no liability or responsibility for any onward transmission or use of emails and attachments having left the Astex Therapeutics domain. Unless expressly stated, opinions in this message are those of the individual sender and not of Astex Therapeutics Ltd. The recipient should check this email and any attachments for the presence of computer viruses. Astex Therapeutics Ltd accepts no liability for damage caused by any virus transmitted by this email. E-mail is susceptible to data corruption, interception, unauthorized amendment, and tampering, Astex Therapeutics Ltd only send and receive e-mails on the basis that the Company is not liable for any such alteration or any consequences thereof. Astex Therapeutics Ltd., Registered in England at 436 Cambridge Science Park, Cambridge CB4 0QA under number 3751674