Dear all, regarding the "remaining strong differences" between measured data and calculated SFs from a a finished (high res structure) I once investigated a bit into this going back to images and looking up some extreme outliers. I found the same - those were clear strong diffraction spots, not ice, not small molecule, genuine protein diffraction. So I had no explanation for those. Some were even "forbidden" intensities, because of screw axes which were correct. structure refined perfectly, no problems at all. I then found some literature about the possibilities of multiple reflections - I guess this is possible but I wonder if you could get easily say a 25 sigma I in this way.
And as we often end our beer-discussions - may be all protein space groups are actually true P1, just close enough to satisfy the high symmetry rules .. but this is getting a bit philosophical I know .. Jan Dohnalek On Wed, Oct 16, 2019 at 6:24 PM Randy Read <rj...@cam.ac.uk> wrote: > James, > > Where we diverge is with your interpretation that big differences lead to > small FOMs. The size of the FOM depends on the product of Fo and Fc, not > their difference. The FOM for a reflection where Fo=1000 and Fc=10 is very > different from the FOM for a reflection with Fo=5000 and Fc=4010, even > though the difference is the same. > > Expanding on this: > > 1. The FOM actually depends more on the E values, i.e. reflections smaller > than average get lower FOM values than ones bigger than average. In the > resolution bin from 5.12 to 5.64Å of 2vb1, the mean observed intensity is > 20687 and the mean calculated intensity is 20022, which means that > Eobs=Sqrt(145.83/20687)=0.084 and Ecalc=Sqrt(7264/20022)=0.602. This > reflection gets a low FOM because the product (0.050) is such a small > number, not because the difference is big. > > 2. You have to consider the role of the model error in the difference, > because for precisely-measured data most of the difference comes from model > error. In this resolution shell, the correlation coefficient between Iobs > and Fcalc^2 is about 0.88, which means that sigmaA is about Sqrt(0.88) = > 0.94. The variance of both the real and imaginary components of Ec (as an > estimate of the phased true E) will be (1-0.94^2)/2 = 0.058, so the > standard deviations of the real and imaginary components of Ec will be > about 0.24. In that context, the difference between Eobs and Ecalc is > nothing like a 2000-sigma outlier. > > Looking at this another way, the reason why the FOM is low for this > reflection is that the conditional probability distribution of Eo given Ec > has significant values on the other side of the origin of the complex > plane. That means that the *phase* of the complex Eo is very uncertain. > The figures in this web page ( > https://www-structmed.cimr.cam.ac.uk/Course/Statistics/statistics.html) > should help to explain that idea. > > Best wishes, > > Randy > > On 16 Oct 2019, at 16:02, James Holton <jmhol...@lbl.gov> wrote: > > > All very true Randy, > > But nevertheless every hkl has an FOM assigned to it, and that is used to > calculate the map. Statistical distribution or not, the trend is that hkls > with big amplitude differences get smaller FOMs, so that means large > model-to-data discrepancies are down-weighted. I wonder sometimes at what > point this becomes a self-fulfilling prophecy? If you look in detail and > the Fo-Fc differences in pretty much any refined structure in the PDB you > will find huge outliers. Some are hundreds of sigmas, and they can go in > either direction. > > Take for example reflection -5,2,2 in the highest-resolution lysozyme > structure in the PDB: 2vb1. Iobs(-5,2,2) was recorded as 145.83 ± 3.62 (at > 5.4 Ang) with Fcalc^2(-5,2,2) = 7264. A 2000-sigma outlier! What are the > odds? On the other hand, Iobs(4,-6,2) = 1611.21 ± 30.67 vs > Fcalc^2(4,-6,2) = 73, which is in the opposite direction. One can always > suppose "experimental errors", but ZD sent me these images and I have > looked at all the spots involved in these hkls. I don't see anything wrong > with any of them. The average multiplicity of this data set was 7.1 and > involved 3 different kappa angles, so I don't think these are "zingers" or > other weird measurement problems. > > I'm not just picking on 2vb1 here. EVERY PDB entry has this problem. Not > sure where it comes from, but the FOM assigned to these huge differences is > always small, so whatever is causing them won't show up in an FOM-weighted > map. > > Is there a way to "change up" the statistical distribution that assigns > FOMs to hkls? Or are we stuck with this systematic error? > > -James Holton > MAD Scientist > > On 10/4/2019 9:31 AM, Randy Read wrote: > > Hi James, > > I'm sure you realise this, but it's important for other readers to > remember that the FOM is a statistical quantity: we have a probability > distribution for the true phase, we pick one phase (the "centroid" phase > that should minimise the RMS error in the density map), and then the FOM is > the expected value of the phase error, obtained by taking the cosines of > all possible phase differences and weighting by the probability of that > phase difference. Because it's a statistical quantity from a random > distribution, you really can't expect this to agree reflection by > reflection! It's a good start to see that the overall values are good, but > if you want to look more closely you have to look a groups of reflections, > e.g. bins of resolution, bins of observed amplitude, bins of calculated > amplitude. However, each bin has to have enough members that the average > will generally be close to the expected value. > > Best wishes, > > Randy Read > > On 4 Oct 2019, at 16:38, James Holton <jmhol...@lbl.gov> wrote: > > I've done a few little experiments over the years using simulated data > where I know the "correct" phase, trying to see just how accurate FOMs > are. What I have found in general is that overall FOM values are fairly > well correlated to overall phase error, but if you go > reflection-by-reflection they are terrible. I suppose this is because FOM > estimates are rooted in amplitudes. Good agreement in amplitude gives you > more confidence in the model (and therefore the phases), but if your R > factor is 55% then your phases probably aren't very good either. However, > if you look at any given h,k,l those assumptions become less and less > applicable. Still, it's the only thing we've got. > > 2qwAt the end of the day, the phase you get out of a refinement program is > the phase of the model. All those fancy "FWT" coefficients with "m" and > "D" or "FOM" weights are modifications to the amplitudes, not the phases. > The phases in your 2mFo-DFc map are identical to those of just an Fc map. > Seriously, have a look! Sometimes you will get a 180 flip to keep the sign > of the amplitude positive, but that's it. Nevertheless, the electron > density of a 2mFo-DFc map is closer to the "correct" electron density than > any other map. This is quite remarkable considering that the "phase error" > is the same. > > This realization is what led my colleagues and I to forget about "phase > error" and start looking at the error in the electron density itself > (10.1073/pnas.1302823110). We did this rather pedagogically. Basically, > pretend you did the whole experiment again, but "change up" the source of > error of interest. For example if you want to see the effect of sigma(F) > then you add random noise with the same magnitude as sigma(F) to the Fs, > and then re-refine the structure. This gives you your new phases, and a > new map. Do this 50 or so times and you get a pretty good idea of how any > source of error of interest propagates into your map. There is even a > little feature in coot for animating these maps, which gives a much more > intuitive view of the "noise". You can also look at variation of model > parameters like the refined occupancy of a ligand, which is a good way to > put an "error bar" on it. The trick is finding the right source of error > to propagate. > > -James Holton > MAD Scientist > > > On 10/2/2019 2:47 PM, Andre LB Ambrosio wrote: > > Dear all, > > How is the phase error estimated for any given reflection, specifically in > the context of model refinement? In terms of math I mean. > > How useful is FOM in assessing the phase quality, when not for initial > experimental phases? > > Many thank in advance, > > Andre. > > ------------------------------ > > To unsubscribe from the CCP4BB list, click the following link: > https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1 > > > > ------------------------------ > > To unsubscribe from the CCP4BB list, click the following link: > https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1 > > > ------ > Randy J. Read > Department of Haematology, University of Cambridge > Cambridge Institute for Medical Research Tel: + 44 1223 336500 > The Keith Peters Building Fax: + 44 1223 > 336827 > Hills Road E-mail: > rj...@cam.ac.uk <rj...@cam.ac.uk> > Cambridge CB2 0XY, U.K. > www-structmed.cimr.cam.ac.uk > > > > ------ > Randy J. Read > Department of Haematology, University of Cambridge > Cambridge Institute for Medical Research Tel: + 44 1223 336500 > The Keith Peters Building Fax: + 44 1223 > 336827 > Hills Road E-mail: > rj...@cam.ac.uk <rj...@cam.ac.uk> > Cambridge CB2 0XY, U.K. > www-structmed.cimr.cam.ac.uk > > > ------------------------------ > > To unsubscribe from the CCP4BB list, click the following link: > https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1 > -- Jan Dohnalek, Ph.D Institute of Biotechnology Academy of Sciences of the Czech Republic Biocev Prumyslova 595 252 50 Vestec near Prague Czech Republic Tel. +420 325 873 758 ######################################################################## To unsubscribe from the CCP4BB list, click the following link: https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1