Re: [ccp4bb] should the final model be refined against full datset

Quyen Hoang Fri, 14 Oct 2011 15:20:09 -0700

Thanks for the clear explanation. I understood that.

But I was trying to understand how this would negatively affects theinitial model to render it useless or less useful.In the scenario that you presented, I would expect a better result(better model) if the initial model was refined with all data, thusmore useful.Sure, again in your scenario, the "new" structure has seen R-freereflections in the equivalent indexes of its replacement model, buttheir intensities should be different anyway, so I am not sure howthis is bad. Even if the bias is huge, let's say this bias results in1% reduction in initial R-free (exaggerating here), how would thismakes one's model bad or how would this be bad for one's science?In the end, our objective is to build the best model possible and Ithink that more data would likely result in better model, not theother way around. If we can agree that refining a model with all datawould result in a better model, then wouldn't not doing so constitutea compromise of model quality for a more "pure" statistic?

I had not refined a model with all data before (just to keep inline),but I wondered if I was doing the best thing.


Cheers,
Quyen
On Oct 14, 2011, at 5:27 PM, Phil Jeffrey wrote:

Let's say you have two isomorphous crystals of two different protein-ligand complexes. Same protein different ligand, same xtal form.Conventionally you'd keep the same free set reflections (hkl values)between the two datasets to reduce biasing. However if the firstmodel had been refined against all reflections there is no longer afree set for that model, thus all hkl's have seen the atoms duringrefinement, and so your R-free in the second complex is initiallybiased to the model from the first complex. [*]
The tendency is to do less refinement in these sort of isomorphouscases than in molecular replacement solutions, because thestructural changes are usually far less (it is isomorphous afterall) so there's a risk that the R-free will not be allowed to fullyfloat free of that initial bias. That makes your R-free look betterthan it actually is.
This is rather strongly analogous to using different free sets inthe two datasets.
However I'm not sure that this is as big of a deal as it is beingmade to sound. It can be dealt with straightforwardly. Howeverrefining against all the data weakens the use of R-free as avalidation tool for that particular model so the people that like tojudge structures based on a single number (i.e. R-free) are going tobe quite put out.
It's also the case that the best model probably *is* the one basedon a careful last round of refinement against all data, as long asnothing much changes. That would need to be quantified in someway(s).
Phil Jeffrey
Princeton
[* Your R-free is also initially model-biased in cases where thedata are significant non-isomorphous or you're using two differentxtal forms, to varying extents]
I still don't understand how a structure model refined with all data
would negatively affect the determination and/or refinement of an
isomorphous structure using a different data set (even withoutdoing SA
first).

Quyen

On Oct 14, 2011, at 4:35 PM, Nat Echols wrote:
On Fri, Oct 14, 2011 at 1:20 PM, Quyen Hoang <qqho...@gmail.com
<mailto:qqho...@gmail.com>> wrote:

   Sorry, I don't quite understand your reasoning for how the
   structure is rendered useless if one refined it with all data.


"Useless" was too strong a word (it's Friday, sorry). I guess
simulated annealing can address the model-bias issue, but I'm not
totally convinced that this solves the problem. And not every
crystallographer will run SA every time he/she solves an isomorphous
structure, so there's a real danger of misleading future users ofthePDB file. The reported R-free, of course, is still meaningless inthe
context of the deposited model.

   Would your argument also apply to all the structures that were
   refined before R-free existed?


Technically, yes - but how many proteins are there whose only
representatives in the PDB were refined this way? I suspect veryfew;
in most cases, a more recent model should be available.

-Nat

Re: [ccp4bb] should the final model be refined against full datset

Reply via email to