Re: [ccp4bb] Figure of merit in refinement

Eleanor Dodson Thu, 17 Oct 2019 23:35:42 -0700

This is hunch speak - not proper analysis, but it is possible to get huge
Fcalc, and hence large difference map terms,  at low resolution by assuming
the solvent volume is a vacuum, not full of partially ordered water
molecules.
The Babinet scaling can do something to correct this but it is a very blunt
tool.  And once a structure is more or less complete the Solvent masked
contribution to Fcalc helps, but there is an intermediate stage where
spurious differences can distort maps.


As Randy says - if either Eobs or Ecalc is small the FOM is also small. The
worst offenders are when Eobs is large but Ecalc is crazy.
I like to look at the plot of <Fobs> v <Fcalc>v resolution,  output by
REFMAC along with Rfactor plots. If thee are large discrepancies
maybe it is time to worry about scaling options..

Eleanor

PS - But are difference map terms weighted by FOM?


On Thu, 17 Oct 2019 at 08:55, Jan Dohnalek <dohnalek...@gmail.com> wrote:

> Dear all,
> regarding the "remaining strong differences" between measured data and
> calculated SFs from a a finished (high res structure) I once investigated a
> bit into this going back to images and looking up some extreme outliers.
> I found the same - those were clear strong diffraction spots, not ice, not
> small molecule, genuine protein diffraction. So I had no explanation for
> those. Some were even "forbidden" intensities, because of screw axes which
> were correct. structure refined perfectly, no problems at all.
> I then found some literature about the possibilities of multiple
> reflections - I guess this is possible but I wonder if you could get easily
> say a 25 sigma I in this way.
>
> And as we often end our beer-discussions - may be all protein space groups
> are actually true P1, just close enough to satisfy the high symmetry rules
> .. but this is getting a bit philosophical I know ..
>
> Jan Dohnalek
>
>
>
>
> On Wed, Oct 16, 2019 at 6:24 PM Randy Read <rj...@cam.ac.uk> wrote:
>
>> James,
>>
>> Where we diverge is with your interpretation that big differences lead to
>> small FOMs.  The size of the FOM depends on the product of Fo and Fc, not
>> their difference.  The FOM for a reflection where Fo=1000 and Fc=10 is very
>> different from the FOM for a reflection with Fo=5000 and Fc=4010, even
>> though the difference is the same.
>>
>> Expanding on this:
>>
>> 1. The FOM actually depends more on the E values, i.e. reflections
>> smaller than average get lower FOM values than ones bigger than average.
>> In the resolution bin from 5.12 to 5.64Å of 2vb1, the mean observed
>> intensity is 20687 and the mean calculated intensity is 20022, which means
>> that Eobs=Sqrt(145.83/20687)=0.084 and Ecalc=Sqrt(7264/20022)=0.602.  This
>> reflection gets a low FOM because the product (0.050) is such a small
>> number, not because the difference is big.
>>
>> 2. You have to consider the role of the model error in the difference,
>> because for precisely-measured data most of the difference comes from model
>> error.  In this resolution shell, the correlation coefficient between Iobs
>> and Fcalc^2 is about 0.88, which means that sigmaA is about Sqrt(0.88) =
>> 0.94.  The variance of both the real and imaginary components of Ec (as an
>> estimate of the phased true E) will be (1-0.94^2)/2 = 0.058, so the
>> standard deviations of the real and imaginary components of Ec will be
>> about 0.24.  In that context, the difference between Eobs and Ecalc is
>> nothing like a 2000-sigma outlier.
>>
>> Looking at this another way, the reason why the FOM is low for this
>> reflection is that the conditional probability distribution of Eo given Ec
>> has significant values on the other side of the origin of the complex
>> plane. That means that the *phase* of the complex Eo is very uncertain.
>> The figures in this web page (
>> https://www-structmed.cimr.cam.ac.uk/Course/Statistics/statistics.html)
>> should help to explain that idea.
>>
>> Best wishes,
>>
>> Randy
>>
>> On 16 Oct 2019, at 16:02, James Holton <jmhol...@lbl.gov> wrote:
>>
>>
>> All very true Randy,
>>
>> But nevertheless every hkl has an FOM assigned to it, and that is used to
>> calculate the map.  Statistical distribution or not, the trend is that hkls
>> with big amplitude differences get smaller FOMs, so that means large
>> model-to-data discrepancies are down-weighted.  I wonder sometimes at what
>> point this becomes a self-fulfilling prophecy?  If you look in detail and
>> the Fo-Fc differences in pretty much any refined structure in the PDB you
>> will find huge outliers.  Some are hundreds of sigmas, and they can go in
>> either direction.
>>
>> Take for example reflection -5,2,2 in the highest-resolution lysozyme
>> structure in the PDB: 2vb1.  Iobs(-5,2,2) was recorded as 145.83 ± 3.62 (at
>> 5.4 Ang) with Fcalc^2(-5,2,2) = 7264.  A 2000-sigma outlier!  What are the
>> odds?   On the other hand, Iobs(4,-6,2) = 1611.21 ± 30.67 vs
>> Fcalc^2(4,-6,2) = 73, which is in the opposite direction.  One can always
>> suppose "experimental errors", but ZD sent me these images and I have
>> looked at all the spots involved in these hkls.  I don't see anything wrong
>> with any of them.  The average multiplicity of this data set was 7.1 and
>> involved 3 different kappa angles, so I don't think these are "zingers" or
>> other weird measurement problems.
>>
>> I'm not just picking on 2vb1 here.  EVERY PDB entry has this problem.
>> Not sure where it comes from, but the FOM assigned to these huge
>> differences is always small, so whatever is causing them won't show up in
>> an FOM-weighted map.
>>
>> Is there a way to "change up" the statistical distribution that assigns
>> FOMs to hkls?  Or are we stuck with this systematic error?
>>
>> -James Holton
>> MAD Scientist
>>
>> On 10/4/2019 9:31 AM, Randy Read wrote:
>>
>> Hi James,
>>
>> I'm sure you realise this, but it's important for other readers to
>> remember that the FOM is a statistical quantity: we have a probability
>> distribution for the true phase, we pick one phase (the "centroid" phase
>> that should minimise the RMS error in the density map), and then the FOM is
>> the expected value of the phase error, obtained by taking the cosines of
>> all possible phase differences and weighting by the probability of that
>> phase difference.  Because it's a statistical quantity from a random
>> distribution, you really can't expect this to agree reflection by
>> reflection!  It's a good start to see that the overall values are good, but
>> if you want to look more closely you have to look a groups of reflections,
>> e.g. bins of resolution, bins of observed amplitude, bins of calculated
>> amplitude.  However, each bin has to have enough members that the average
>> will generally be close to the expected value.
>>
>> Best wishes,
>>
>> Randy Read
>>
>> On 4 Oct 2019, at 16:38, James Holton <jmhol...@lbl.gov> wrote:
>>
>> I've done a few little experiments over the years using simulated data
>> where I know the "correct" phase, trying to see just how accurate FOMs
>> are.  What I have found in general is that overall FOM values are fairly
>> well correlated to overall phase error, but if you go
>> reflection-by-reflection they are terrible.  I suppose this is because FOM
>> estimates are rooted in amplitudes.  Good agreement in amplitude gives you
>> more confidence in the model (and therefore the phases), but if your R
>> factor is 55% then your phases probably aren't very good either.  However,
>> if you look at any given h,k,l those assumptions become less and less
>> applicable.  Still, it's the only thing we've got.
>>
>> 2qwAt the end of the day, the phase you get out of a refinement program
>> is the phase of the model.  All those fancy "FWT" coefficients with "m" and
>> "D" or "FOM" weights are modifications to the amplitudes, not the phases.
>> The phases in your 2mFo-DFc map are identical to those of just an Fc map.
>> Seriously, have a look!  Sometimes you will get a 180 flip to keep the sign
>> of the amplitude positive, but that's it.  Nevertheless, the electron
>> density of a 2mFo-DFc map is closer to the "correct" electron density than
>> any other map.  This is quite remarkable considering that the "phase error"
>> is the same.
>>
>> This realization is what led my colleagues and I to forget about "phase
>> error" and start looking at the error in the electron density itself
>> (10.1073/pnas.1302823110).  We did this rather pedagogically.  Basically,
>> pretend you did the whole experiment again, but "change up" the source of
>> error of interest.  For example if you want to see the effect of sigma(F)
>> then you add random noise with the same magnitude as sigma(F) to the Fs,
>> and then re-refine the structure.  This gives you your new phases, and a
>> new map. Do this 50 or so times and you get a pretty good idea of how any
>> source of error of interest propagates into your map.  There is even a
>> little feature in coot for animating these maps, which gives a much more
>> intuitive view of the "noise".  You can also look at variation of model
>> parameters like the refined occupancy of a ligand, which is a good way to
>> put an "error bar" on it.  The trick is finding the right source of error
>> to propagate.
>>
>> -James Holton
>> MAD Scientist
>>
>>
>> On 10/2/2019 2:47 PM, Andre LB Ambrosio wrote:
>>
>> Dear all,
>>
>> How is the phase error estimated for any given reflection, specifically
>> in the context of model refinement? In terms of math I mean.
>>
>> How useful is FOM in assessing the phase quality, when not for initial
>> experimental phases?
>>
>> Many thank in advance,
>>
>> Andre.
>>
>> ------------------------------
>>
>> To unsubscribe from the CCP4BB list, click the following link:
>> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1
>>
>>
>>
>> ------------------------------
>>
>> To unsubscribe from the CCP4BB list, click the following link:
>> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1
>>
>>
>> ------
>> Randy J. Read
>> Department of Haematology, University of Cambridge
>> Cambridge Institute for Medical Research     Tel: + 44 1223 336500
>> The Keith Peters Building                               Fax: + 44 1223
>> 336827
>> Hills Road                                                       E-mail:
>> rj...@cam.ac.uk <rj...@cam.ac.uk>
>> Cambridge CB2 0XY, U.K.
>> www-structmed.cimr.cam.ac.uk
>>
>>
>>
>> ------
>> Randy J. Read
>> Department of Haematology, University of Cambridge
>> Cambridge Institute for Medical Research     Tel: + 44 1223 336500
>> The Keith Peters Building                               Fax: + 44 1223
>> 336827
>> Hills Road                                                       E-mail:
>> rj...@cam.ac.uk <rj...@cam.ac.uk>
>> Cambridge CB2 0XY, U.K.
>> www-structmed.cimr.cam.ac.uk
>>
>>
>> ------------------------------
>>
>> To unsubscribe from the CCP4BB list, click the following link:
>> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1
>>
>
>
> --
> Jan Dohnalek, Ph.D
> Institute of Biotechnology
> Academy of Sciences of the Czech Republic
> Biocev
> Prumyslova 595
> 252 50 Vestec near Prague
> Czech Republic
>
> Tel. +420 325 873 758
>
> ------------------------------
>
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1
>

########################################################################

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1

Re: [ccp4bb] Figure of merit in refinement

Reply via email to