Re: [ccp4bb] Figure of merit in refinement

Randy Read Fri, 18 Oct 2019 01:05:47 -0700

Dear Eleanor,

Yes, difference maps are weighted by FOM, with the coefficient being m*Fo-D*Fc, 
phased by the model. If Fc is small, then m will be small because, even if Fo 
is large, you have no idea what phase to assign to the difference.  If Fc is 
large because you haven't treated bulk solvent, it turns out that D will 
effectively apply a Babinet scaling because D includes a term with the square 
root of the mean observed intensity divided by the mean calculated intensity.  
If Fc is large even with a bulk solvent correction, then you want a big 
negative term in the difference coefficient so that's fine too!


Best wishes,

Randy

> On 18 Oct 2019, at 07:31, Eleanor Dodson 
> <0000176a9d5ebad7-dmarc-requ...@jiscmail.ac.uk> wrote:
> 
> This is hunch speak - not proper analysis, but it is possible to get huge 
> Fcalc, and hence large difference map terms,  at low resolution by assuming 
> the solvent volume is a vacuum, not full of partially ordered water 
> molecules. 
> The Babinet scaling can do something to correct this but it is a very blunt 
> tool.  And once a structure is more or less complete the Solvent masked 
> contribution to Fcalc helps, but there is an intermediate stage where 
> spurious differences can distort maps.
> 
> As Randy says - if either Eobs or Ecalc is small the FOM is also small. The 
> worst offenders are when Eobs is large but Ecalc is crazy. 
> I like to look at the plot of <Fobs> v <Fcalc>v resolution,  output by REFMAC 
> along with Rfactor plots. If thee are large discrepancies
> maybe it is time to worry about scaling options..
> 
> Eleanor
> 
> PS - But are difference map terms weighted by FOM? 
> 
> 
> On Thu, 17 Oct 2019 at 08:55, Jan Dohnalek <dohnalek...@gmail.com 
> <mailto:dohnalek...@gmail.com>> wrote:
> Dear all,
> regarding the "remaining strong differences" between measured data and 
> calculated SFs from a a finished (high res structure) I once investigated a 
> bit into this going back to images and looking up some extreme outliers.
> I found the same - those were clear strong diffraction spots, not ice, not 
> small molecule, genuine protein diffraction. So I had no explanation for 
> those. Some were even "forbidden" intensities, because of screw axes which 
> were correct. structure refined perfectly, no problems at all.
> I then found some literature about the possibilities of multiple reflections 
> - I guess this is possible but I wonder if you could get easily say a 25 
> sigma I in this way.
> 
> And as we often end our beer-discussions - may be all protein space groups 
> are actually true P1, just close enough to satisfy the high symmetry rules .. 
> but this is getting a bit philosophical I know ..
> 
> Jan Dohnalek
> 
> 
> 
> 
> On Wed, Oct 16, 2019 at 6:24 PM Randy Read <rj...@cam.ac.uk 
> <mailto:rj...@cam.ac.uk>> wrote:
> James,
> 
> Where we diverge is with your interpretation that big differences lead to 
> small FOMs.  The size of the FOM depends on the product of Fo and Fc, not 
> their difference.  The FOM for a reflection where Fo=1000 and Fc=10 is very 
> different from the FOM for a reflection with Fo=5000 and Fc=4010, even though 
> the difference is the same.
> 
> Expanding on this: 
> 
> 1. The FOM actually depends more on the E values, i.e. reflections smaller 
> than average get lower FOM values than ones bigger than average.  In the 
> resolution bin from 5.12 to 5.64Å of 2vb1, the mean observed intensity is 
> 20687 and the mean calculated intensity is 20022, which means that 
> Eobs=Sqrt(145.83/20687)=0.084 and Ecalc=Sqrt(7264/20022)=0.602.  This 
> reflection gets a low FOM because the product (0.050) is such a small number, 
> not because the difference is big.
> 
> 2. You have to consider the role of the model error in the difference, 
> because for precisely-measured data most of the difference comes from model 
> error.  In this resolution shell, the correlation coefficient between Iobs 
> and Fcalc^2 is about 0.88, which means that sigmaA is about Sqrt(0.88) = 
> 0.94.  The variance of both the real and imaginary components of Ec (as an 
> estimate of the phased true E) will be (1-0.94^2)/2 = 0.058, so the standard 
> deviations of the real and imaginary components of Ec will be about 0.24.  In 
> that context, the difference between Eobs and Ecalc is nothing like a 
> 2000-sigma outlier.
> 
> Looking at this another way, the reason why the FOM is low for this 
> reflection is that the conditional probability distribution of Eo given Ec 
> has significant values on the other side of the origin of the complex plane. 
> That means that the *phase* of the complex Eo is very uncertain.  The figures 
> in this web page 
> (https://www-structmed.cimr.cam.ac.uk/Course/Statistics/statistics.html 
> <https://www-structmed.cimr.cam.ac.uk/Course/Statistics/statistics.html>) 
> should help to explain that idea.
> 
> Best wishes,
> 
> Randy
> 
>> On 16 Oct 2019, at 16:02, James Holton <jmhol...@lbl.gov 
>> <mailto:jmhol...@lbl.gov>> wrote:
>> 
>> 
>> All very true Randy,
>> 
>> But nevertheless every hkl has an FOM assigned to it, and that is used to 
>> calculate the map.  Statistical distribution or not, the trend is that hkls 
>> with big amplitude differences get smaller FOMs, so that means large 
>> model-to-data discrepancies are down-weighted.  I wonder sometimes at what 
>> point this becomes a self-fulfilling prophecy?  If you look in detail and 
>> the Fo-Fc differences in pretty much any refined structure in the PDB you 
>> will find huge outliers.  Some are hundreds of sigmas, and they can go in 
>> either direction.
>> 
>> Take for example reflection -5,2,2 in the highest-resolution lysozyme 
>> structure in the PDB: 2vb1.  Iobs(-5,2,2) was recorded as 145.83 ± 3.62 (at 
>> 5.4 Ang) with Fcalc^2(-5,2,2) = 7264.  A 2000-sigma outlier!  What are the 
>> odds?   On the other hand, Iobs(4,-6,2) = 1611.21 ± 30.67 vs Fcalc^2(4,-6,2) 
>> = 73, which is in the opposite direction.  One can always suppose 
>> "experimental errors", but ZD sent me these images and I have looked at all 
>> the spots involved in these hkls.  I don't see anything wrong with any of 
>> them.  The average multiplicity of this data set was 7.1 and involved 3 
>> different kappa angles, so I don't think these are "zingers" or other weird 
>> measurement problems.
>> 
>> I'm not just picking on 2vb1 here.  EVERY PDB entry has this problem.  Not 
>> sure where it comes from, but the FOM assigned to these huge differences is 
>> always small, so whatever is causing them won't show up in an FOM-weighted 
>> map.
>> 
>> Is there a way to "change up" the statistical distribution that assigns FOMs 
>> to hkls?  Or are we stuck with this systematic error?
>> 
>> -James Holton
>> MAD Scientist
>> 
>> On 10/4/2019 9:31 AM, Randy Read wrote:
>>> Hi James,
>>> 
>>> I'm sure you realise this, but it's important for other readers to remember 
>>> that the FOM is a statistical quantity: we have a probability distribution 
>>> for the true phase, we pick one phase (the "centroid" phase that should 
>>> minimise the RMS error in the density map), and then the FOM is the 
>>> expected value of the phase error, obtained by taking the cosines of all 
>>> possible phase differences and weighting by the probability of that phase 
>>> difference.  Because it's a statistical quantity from a random 
>>> distribution, you really can't expect this to agree reflection by 
>>> reflection!  It's a good start to see that the overall values are good, but 
>>> if you want to look more closely you have to look a groups of reflections, 
>>> e.g. bins of resolution, bins of observed amplitude, bins of calculated 
>>> amplitude.  However, each bin has to have enough members that the average 
>>> will generally be close to the expected value.
>>> 
>>> Best wishes,
>>> 
>>> Randy Read
>>> 
>>>> On 4 Oct 2019, at 16:38, James Holton <jmhol...@lbl.gov 
>>>> <mailto:jmhol...@lbl.gov>> wrote:
>>>> 
>>>> I've done a few little experiments over the years using simulated data 
>>>> where I know the "correct" phase, trying to see just how accurate FOMs 
>>>> are.  What I have found in general is that overall FOM values are fairly 
>>>> well correlated to overall phase error, but if you go 
>>>> reflection-by-reflection they are terrible.  I suppose this is because FOM 
>>>> estimates are rooted in amplitudes.                  Good agreement in 
>>>> amplitude gives you more confidence in the model (and therefore the 
>>>> phases), but if your R factor is 55% then your phases probably aren't very 
>>>> good either.  However, if you look at any given h,k,l those assumptions 
>>>> become less and less applicable.  Still, it's the only thing we've got.
>>>> 
>>>> 2qwAt the end of the day, the phase you get out of a refinement program is 
>>>> the phase of the model.  All those fancy "FWT" coefficients with "m" and 
>>>> "D" or "FOM" weights are modifications to the amplitudes, not the phases.  
>>>> The phases in your 2mFo-DFc map are identical to those of just an Fc map.  
>>>> Seriously, have a look!  Sometimes you will get a 180 flip to keep the 
>>>> sign of                 the amplitude positive, but that's it.  
>>>> Nevertheless, the electron density of a 2mFo-DFc map is closer to the 
>>>> "correct" electron density than any other map.  This is quite remarkable 
>>>> considering that the "phase error" is the same.
>>>> 
>>>> This realization is what led my colleagues and I to forget about "phase 
>>>> error" and start looking at the error in the electron density itself 
>>>> (10.1073/pnas.1302823110).  We did this rather pedagogically.  Basically, 
>>>> pretend you did the whole experiment again, but "change up" the source of 
>>>> error of interest.  For example if you want to see the effect of sigma(F) 
>>>> then you add random noise with the same magnitude as sigma(F) to the Fs, 
>>>> and then re-refine the structure.  This gives you your new phases, and a 
>>>> new map. Do this 50 or so times and you get a pretty good idea of how any  
>>>> source of error of interest propagates into your map.  There is even a 
>>>> little feature in coot for animating these maps, which gives a much more 
>>>> intuitive view of the "noise".  You can also look at variation of model 
>>>> parameters like the refined occupancy of a ligand, which is a good way to 
>>>> put an "error bar" on it.  The trick is finding the right source of error 
>>>> to propagate.
>>>> 
>>>> -James Holton
>>>> MAD Scientist
>>>> 
>>>> 
>>>> On 10/2/2019 2:47 PM, Andre LB Ambrosio wrote:
>>>>> Dear all,
>>>>> 
>>>>> How is the phase error estimated for any given reflection, specifically 
>>>>> in the context of model refinement? In terms of math I mean.
>>>>> 
>>>>> How useful is FOM in assessing the phase quality, when not for initial 
>>>>> experimental phases?
>>>>> 
>>>>> Many thank in advance,
>>>>> 
>>>>> Andre.
>>>>> 
>>>>> To unsubscribe from the CCP4BB list, click the following link:
>>>>> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1 
>>>>> <https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1>
>>>> 
>>>> To unsubscribe from the CCP4BB list, click the following link:
>>>> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1 
>>>> <https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1>
>>> ------
>>> Randy J. Read
>>> Department of Haematology, University of Cambridge
>>> Cambridge Institute for Medical Research     Tel: + 44 1223 336500
>>> The Keith Peters Building                               Fax: + 44 1223 
>>> 336827
>>> Hills Road                                                       E-mail: 
>>> rj...@cam.ac.uk <mailto:rj...@cam.ac.uk>
>>> Cambridge CB2 0XY, U.K.                             
>>> www-structmed.cimr.cam.ac.uk <http://www-structmed.cimr.cam.ac.uk/>
>> 
> 
> ------
> Randy J. Read
> Department of Haematology, University of Cambridge
> Cambridge Institute for Medical Research     Tel: + 44 1223 336500
> The Keith Peters Building                               Fax: + 44 1223 336827
> Hills Road                                                       E-mail: 
> rj...@cam.ac.uk <mailto:rj...@cam.ac.uk>
> Cambridge CB2 0XY, U.K.                             
> www-structmed.cimr.cam.ac.uk <http://www-structmed.cimr.cam.ac.uk/>
> 
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1 
> <https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1>
> 
> -- 
> Jan Dohnalek, Ph.D
> Institute of Biotechnology
> Academy of Sciences of the Czech Republic
> Biocev
> Prumyslova 595
> 252 50 Vestec near Prague
> Czech Republic
> 
> Tel. +420 325 873 758
> 
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1 
> <https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1>
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1 
> <https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1>
------
Randy J. Read
Department of Haematology, University of Cambridge
Cambridge Institute for Medical Research     Tel: + 44 1223 336500
The Keith Peters Building                               Fax: + 44 1223 336827
Hills Road                                                       E-mail: 
rj...@cam.ac.uk
Cambridge CB2 0XY, U.K.                             www-structmed.cimr.cam.ac.uk


########################################################################

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1

Re: [ccp4bb] Figure of merit in refinement

Reply via email to