Thanks for the clear explanation. I understood that.
But I was trying to understand how this would negatively affects the
initial model to render it useless or less useful.
In the scenario that you presented, I would expect a better result
(better model) if the initial model was refined with all data, thus
more useful.
Sure, again in your scenario, the "new" structure has seen R-free
reflections in the equivalent indexes of its replacement model, but
their intensities should be different anyway, so I am not sure how
this is bad. Even if the bias is huge, let's say this bias results in
1% reduction in initial R-free (exaggerating here), how would this
makes one's model bad or how would this be bad for one's science?
In the end, our objective is to build the best model possible and I
think that more data would likely result in better model, not the
other way around. If we can agree that refining a model with all data
would result in a better model, then wouldn't not doing so constitute
a compromise of model quality for a more "pure" statistic?
I had not refined a model with all data before (just to keep inline),
but I wondered if I was doing the best thing.
Cheers,
Quyen
On Oct 14, 2011, at 5:27 PM, Phil Jeffrey wrote:
Let's say you have two isomorphous crystals of two different protein-
ligand complexes. Same protein different ligand, same xtal form.
Conventionally you'd keep the same free set reflections (hkl values)
between the two datasets to reduce biasing. However if the first
model had been refined against all reflections there is no longer a
free set for that model, thus all hkl's have seen the atoms during
refinement, and so your R-free in the second complex is initially
biased to the model from the first complex. [*]
The tendency is to do less refinement in these sort of isomorphous
cases than in molecular replacement solutions, because the
structural changes are usually far less (it is isomorphous after
all) so there's a risk that the R-free will not be allowed to fully
float free of that initial bias. That makes your R-free look better
than it actually is.
This is rather strongly analogous to using different free sets in
the two datasets.
However I'm not sure that this is as big of a deal as it is being
made to sound. It can be dealt with straightforwardly. However
refining against all the data weakens the use of R-free as a
validation tool for that particular model so the people that like to
judge structures based on a single number (i.e. R-free) are going to
be quite put out.
It's also the case that the best model probably *is* the one based
on a careful last round of refinement against all data, as long as
nothing much changes. That would need to be quantified in some
way(s).
Phil Jeffrey
Princeton
[* Your R-free is also initially model-biased in cases where the
data are significant non-isomorphous or you're using two different
xtal forms, to varying extents]
I still don't understand how a structure model refined with all data
would negatively affect the determination and/or refinement of an
isomorphous structure using a different data set (even without
doing SA
first).
Quyen
On Oct 14, 2011, at 4:35 PM, Nat Echols wrote:
On Fri, Oct 14, 2011 at 1:20 PM, Quyen Hoang <qqho...@gmail.com
<mailto:qqho...@gmail.com>> wrote:
Sorry, I don't quite understand your reasoning for how the
structure is rendered useless if one refined it with all data.
"Useless" was too strong a word (it's Friday, sorry). I guess
simulated annealing can address the model-bias issue, but I'm not
totally convinced that this solves the problem. And not every
crystallographer will run SA every time he/she solves an isomorphous
structure, so there's a real danger of misleading future users of
the
PDB file. The reported R-free, of course, is still meaningless in
the
context of the deposited model.
Would your argument also apply to all the structures that were
refined before R-free existed?
Technically, yes - but how many proteins are there whose only
representatives in the PDB were refined this way? I suspect very
few;
in most cases, a more recent model should be available.
-Nat