Dear Ian,
many thanks for your explanations - they've changed my view! I was
always a bit puzzled by the supposedly contradictory transition between
restraints and constraints with increasing weight, which has been
clarified by their effect on the number of parameters, and not on the
number of observations.
Interestingly, in your Acta Cryst paper, restraints are also counted as
observations (for instance, in Table 1 and §2.3), but in the derived
residuals and ratios, it's clear that they reduce the effective number
of parameters.
Best regards,
Dirk.
Am 20.09.10 13:22, schrieb Ian Tickle:
Hi Dirk
First, constraints are just a special case of restraints in the limit
of infinite weights, in fact one way of getting constraints is simply
to use restraints with very large weights (though not too large that
you get rounding problems). These 'pseudo-constraints' will be
indistinguishable in effect from the 'real thing'. So why treat
restraints and constraints differently as far as the statistics are
concerned: the difference is purely one of implementation.
Second, restraints are not interchangeable 1-for-1 with X-ray data as
far as the statistics are concerned: N restraints cannot be considered
as equivalent to N X-ray data, which would be the implication of
adding together the number of restraints and the number of X-ray data.
This can be seen in the estimation of the expected values of the
residuals (chi-squared) for the working& test sets, which are used to
estimate the expected Rfree. If you take a look at our 1998 AC paper
(D54, 547-557), Table 2 (p.551), the last row of the table (labelled
'RGfree/RG') shows the expected residuals for the working set
(denominator) and test set (numerator) for the cases of no restraints,
restrained and constrained refinement:
No restraints (or constraints):
<Dwork> = f - m
<Dfree> = f + m
Restrained:
<Dwork> = f - (m - r + Drest)
<Dfree> = f + (m - r + Drest)
Constrained:
<Dwork> = f - (m - r)
<Dfree> = f + (m - r)
where:
<Dwork> = expected working set residual (chi-squared),
<Dfree> = expected test set residual (chi-squared),
f = no of reflections in working set,
m = no of parameters,
r = no of restraints and/or constraints,
Drest = restraint residual (chi-squared).
The constrained case is obviously just a special case of the
restrained case with Drest = 0, i.e. in the constrained case the
difference between the refined and target values is zero, and the 'no
restraints' case is a special case of this with r = 0. We can
generalise all of this by writing simply:
<Dwork> = f - m'
<Dfree> = f + m'
where m' is the effective no of parameters corrected for restraints
and/or constraints (m' = m - r + Drest); the effective no of
parameters is reduced whether you're using restraints or constraints.
In the case where you had both restraints and constraints r would be
the total no of restraints + constraints, however constraints
contribute nothing to Drest. The 'effectiveness' of a restraint
depends on its contribution to Drest (Z^2), a smaller value means it's
more effective. A contribution of Z^2 = 1 to Drest completely cancels
the effect of increasing r by 1 by adding the restraint (i.e. the
restraint has no effect).
This incidentally shows that the effect of over-fitting (adding
redundant effective parameters) is to reduce the working set and
increase the test set residuals. If you consider the working set
residual in the general case:
<Dwork> = f - (m - r + Drest) = f + r - m - Drest
it certainly appears from this that the number of X-ray data (f) and
the number of restraints (r) are being added.
However if you consider the test set residual:
<Dfree> = f + (m - r + Drest) = f - r + m + Drest
this is clearly not the case. All you can say is that the effective
number of parameters is reduced by the number of restraints +
constraints.
Cheers
-- Ian
On Mon, Sep 20, 2010 at 9:20 AM, Dirk Kostrewa
<kostr...@genzentrum.lmu.de> wrote:
Hi Ian,
Am 19.09.10 15:25, schrieb Ian Tickle:
Hi Florian,
Tight NCS restraints or NCS constraints (they are essentially the same
thing in effect if not in implementation) both reduce the effective
parameter count on a 1-for-1 basis.
Restraints should not be considered as being added to the pool of
X-ray observations in the calculation of the obs/param ratio, simply
because restraints and X-ray observations can in no way be regarded as
interchangeable (increasing the no of restraints by N is not
equivalent to increasing the no of reflections by N). This becomes
apparent when you try to compute the expected Rfree: the effective
contribution of the restraints has to be subtracted from the parameter
count, not added to the observation count.
I always understood the difference between constraints and restraints such,
that a constraint reduces the number of parameters by fixing certain
parameters, whereas restraints are target values for parameters and as such
can be counted as observations, similarly to the Fobs, which are target
values for the Fcalc (although with different weights). I don't see what is
wrong with this view. Do I misunderstand something?
Best regards,
Dirk.
The complication is that a 'weak' restraint is equivalent to less than
1 parameter (I call it the 'effective no of restraints': it can be
calculated from the chi-squared for the restraint). Obviously no
restraint is equivalent no parameter, so you can think of it as a
continuous sliding scale from no restraint (effective contribution to
be subtracted from parameter count = 0) through weak restraint (0<
contribution< 1) through tight restraint (count ~=1) to constraint
(count = 1).
Cheers
-- Ian
On Sat, Sep 18, 2010 at 9:23 PM, Florian Schmitzberger
<schmitzber...@crystal.harvard.edu> wrote:
Dear All,
I would have a question regarding the effect of non-crystallographic
symmetry (NCS) on the data:parameter ratio in refinement.
I am working with X-ray data to a maximum resolution of 4.1-4.4
Angstroem,
79 % solvent content, in P6222 space group; with 22 300 unique
reflections
and expected 1132 amino acid residues in the asymmetric unit, proper
2-fold
rotational NCS (SAD phased and no high-resolution molecular replacement
or
homology model available).
Assuming refinement of x,y,z, B and a polyalanine model (i.e. ca. 5700
atoms), this would equal an observation:parameter ratio of roughly 1:1.
This
I think would be equivalent to a "normal" protein with 50 % solvent
content,
diffracting to better than 3 Angstroem resolution (from the statistics I
could find, at that resolution a mean data:parameter ratio of ca. 0.9:1
can
be expected for refinement of x,y,z, and individual isotropic B; ignoring
bond angle/length geometrical restraints at the moment).
My question is how I could factor in the 2-fold rotational NCS for the
estimate of the observations, assuming tight NCS restraints (or even
constraint). It is normally assumed NCS reduces the noise by a factor of
the
square root of the NCS order, but I would be more interested how much it
adds on the observation side (used as a restraint) or reduction of the
parameters (used as a constraint). I don't suppose it would be correct to
assume that the 2-fold NCS would half the number of parameters to refine
(assuming an NCS constraint)?
Regards,
Florian
-----------------------------------------------------------
Florian Schmitzberger
Biological Chemistry and Molecular Pharmacology
Harvard Medical School
250 Longwood Avenue, SGM 130
Boston, MA 02115, US
Tel: 001 617 432 5602
--
*******************************************************
Dirk Kostrewa
Gene Center Munich, A5.07
Department of Biochemistry
Ludwig-Maximilians-Universität München
Feodor-Lynen-Str. 25
D-81377 Munich
Germany
Phone: +49-89-2180-76845
Fax: +49-89-2180-76999
E-mail: kostr...@genzentrum.lmu.de
WWW: www.genzentrum.lmu.de
*******************************************************
--
*******************************************************
Dirk Kostrewa
Gene Center Munich, A5.07
Department of Biochemistry
Ludwig-Maximilians-Universität München
Feodor-Lynen-Str. 25
D-81377 Munich
Germany
Phone: +49-89-2180-76845
Fax: +49-89-2180-76999
E-mail: kostr...@genzentrum.lmu.de
WWW: www.genzentrum.lmu.de
*******************************************************