Hi James, This is what you need.
https://en.wikipedia.org/wiki/Generalized_extreme_value_distribution The distribution of a maximum of 1k random variates looks like this, and the (fitted by eye) analytical distribution associated with it seems to have a decent fit - as expected. [image: image.png] The idea of a p-value to judge the quality of a structure is interesting. xtriage uses this mechanism to flag suspicious normalized intensities, the idea being that in a small dataset it is less likely to see a large E value as compared to in a large dataset. The issue of course is that the total intensity of a normalized intensity is bound by the number of atoms and the underlying assumption used is that it can be potentially infinitely large. It still is a decent metric I think. P P On Tue, Nov 8, 2022 at 3:25 PM James Holton <jmhol...@lbl.gov> wrote: > Thank you Ian for your quick response! > > I suppose what I'm really trying to do is put a p-value on the "geometry" > of a given PDB file. As in: what are the odds the deviations from ideality > of this model are due to chance? > > I am leaning toward the need to take all the deviations in the structure > together as a set, but, as Joao just noted, that it just "feels wrong" to > tolerate a 3-sigma deviate. Even more wrong to tolerate 4 sigma, 5 sigma. > And 6 sigma deviates are really difficult to swallow unless your have > trillions of data points. > > To put it down in equations, is the p-value of a structure with 1000 bonds > in it with one 3-sigma deviate given by: > > a) p = 1-erf(3/sqrt(2)) > or > b) p = 1-erf(3/sqrt(2))**1000 > or > c) something else? > > > > On 11/8/2022 2:56 PM, Ian Tickle wrote: > > Hi James > > I don't think it's meaningful to ask whether the deviation of a single > bond length (or anything else that's single) from its expected value is > significant, since as you say there's always some finite probability that > it occurred purely by chance. Statistics can only meaningfully be applied > to samples of a 'reasonable' size. I know there are statistics designed > for small samples but not for samples of size 1 ! It's more meaningful to > talk about distributions. For example if 1% of the sample contained > deviations > 3 sigma when you expected there to be only 0.3 %, that is > probably significant (but it still has a finite probability of occurring by > chance), as would be finding no deviations > 3 sigma (for a reasonably > large sample to avoid sampling errors). > > Cheers > > -- Ian > > > On Tue, Nov 8, 2022, 22:22 James Holton <jmhol...@lbl.gov> wrote: > >> OK, so lets suppose there is this bond in your structure that is >> stretched a bit. Is that for real? Or just a random fluke? Let's say >> for example its a CA-CB bond that is supposed to be 1.529 A long, but in >> your model its 1.579 A. This is 0.05 A too long. Doesn't seem like >> much, right? But the "sigma" given to such a bond in our geometry >> libraries is 0.016 A. These sigmas are typically derived from a >> database of observed bonds of similar type found in highly accurate >> structures, like small molecules. So, that makes this a 3-sigma outlier. >> Assuming the distribution of deviations is Gaussian, that's a pretty >> unlikely thing to happen. You expect 3-sigma deviates to appear less >> than 0.3% of the time. So, is that significant? >> >> But, then again, there are lots of other bonds in the structure. Lets >> say there are 1000. With that many samplings from a Gaussian >> distribution you generally expect to see a 3-sigma deviate at least >> once. That is, do an "experiment" where you pick 1000 Gaussian-random >> numbers from a distribution with a standard deviation of 1.0. Then, look >> for the maximum over all 1000 trials. Is that one > 3 sigma? It probably >> is. If you do this "experiment" millions of times it turns out seeing at >> least one 3-sigma deviate in 1000 tries is very common. Specifically, >> about 93% of the time. It is rare indeed to have every member of a >> 1000-deviate set all lie within 3 sigmas. So, we have gone from one >> 3-sigma deviate being highly unlikely to being a virtual certainty if >> you look at enough samples. >> >> So, my question is: is a 3-sigma deviate significant? Is it significant >> only if you have one bond in the structure? What about angles? What if >> you have 500 bonds and 500 angles? Do they count as 1000 deviates >> together? Or separately? >> >> I'm sure the more mathematically inclined out there will have some >> intelligent answers for the rest of us, however, if you are not a >> mathematician, how about a vote? Is a 3-sigma bond length deviation >> significant? Or not? >> >> Looking forward to both kinds of responses, >> >> -James Holton >> MAD Scientist >> >> ######################################################################## >> >> To unsubscribe from the CCP4BB list, click the following link: >> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 >> >> This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a >> mailing list hosted by www.jiscmail.ac.uk, terms & conditions are >> available at https://www.jiscmail.ac.uk/policyandsecurity/ >> > > > ------------------------------ > > To unsubscribe from the CCP4BB list, click the following link: > https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 > -- ------------------------------------------------------------------------------------------ P.H. Zwart Staff Scientist, Molecular Biophysics and Integrated Bioimaging Biosciences Lead, Center for Advanced Mathematics for Energy Research Applications Lawrence Berkeley National Laboratories 1 Cyclotron Road, Berkeley, CA-94703, USA Cell: 510 289 9246 PHENIX: http://www.phenix-online.org CAMERA: http://camera.lbl.gov/ ------------------------------------------------------------------------------------------ ######################################################################## To unsubscribe from the CCP4BB list, click the following link: https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list hosted by www.jiscmail.ac.uk, terms & conditions are available at https://www.jiscmail.ac.uk/policyandsecurity/