Hi James,

This is what you need.

https://en.wikipedia.org/wiki/Generalized_extreme_value_distribution

The distribution of a maximum of 1k random variates looks like this, and
the (fitted by eye) analytical distribution associated with it seems to
have a decent fit - as expected.

[image: image.png]

The idea of a p-value to judge the quality of a structure is interesting.
xtriage uses this mechanism to flag suspicious normalized intensities, the
idea being that in a small dataset it is less likely to see a large E value
as compared to in a large dataset.
The issue of course is that the total intensity of a normalized intensity
is bound by the number of atoms and the underlying assumption used is
that it can be potentially infinitely large. It still is a decent metric I
think.

P


P


On Tue, Nov 8, 2022 at 3:25 PM James Holton <jmhol...@lbl.gov> wrote:

> Thank you Ian for your quick response!
>
> I suppose what I'm really trying to do is put a p-value on the "geometry"
> of a given PDB file.  As in: what are the odds the deviations from ideality
> of this model are due to chance?
>
> I am leaning toward the need to take all the deviations in the structure
> together as a set, but, as Joao just noted, that it just "feels wrong" to
> tolerate a 3-sigma deviate.  Even more wrong to tolerate 4 sigma, 5 sigma.
> And 6 sigma deviates are really difficult to swallow unless your have
> trillions of data points.
>
> To put it down in equations, is the p-value of a structure with 1000 bonds
> in it with one 3-sigma deviate given by:
>
> a)  p = 1-erf(3/sqrt(2))
> or
> b)  p = 1-erf(3/sqrt(2))**1000
> or
> c) something else?
>
>
>
> On 11/8/2022 2:56 PM, Ian Tickle wrote:
>
> Hi James
>
> I don't think it's meaningful to ask whether the deviation of a single
> bond length (or anything else that's single) from its expected value is
> significant, since as you say there's always some finite probability that
> it occurred purely by chance.  Statistics can only meaningfully be applied
> to samples of a 'reasonable' size.  I know there are statistics designed
> for small samples but not for samples of size 1 !  It's more meaningful to
> talk about distributions.  For example if 1% of the sample contained
> deviations > 3 sigma when you expected there to be only 0.3 %, that is
> probably significant (but it still has a finite probability of occurring by
> chance), as would be finding no deviations > 3 sigma (for a reasonably
> large sample to avoid sampling errors).
>
> Cheers
>
> -- Ian
>
>
> On Tue, Nov 8, 2022, 22:22 James Holton <jmhol...@lbl.gov> wrote:
>
>> OK, so lets suppose there is this bond in your structure that is
>> stretched a bit.  Is that for real? Or just a random fluke?  Let's say
>> for example its a CA-CB bond that is supposed to be 1.529 A long, but in
>> your model its 1.579 A.  This is 0.05 A too long. Doesn't seem like
>> much, right? But the "sigma" given to such a bond in our geometry
>> libraries is 0.016 A.  These sigmas are typically derived from a
>> database of observed bonds of similar type found in highly accurate
>> structures, like small molecules. So, that makes this a 3-sigma outlier.
>> Assuming the distribution of deviations is Gaussian, that's a pretty
>> unlikely thing to happen. You expect 3-sigma deviates to appear less
>> than 0.3% of the time.  So, is that significant?
>>
>> But, then again, there are lots of other bonds in the structure. Lets
>> say there are 1000. With that many samplings from a Gaussian
>> distribution you generally expect to see a 3-sigma deviate at least
>> once.  That is, do an "experiment" where you pick 1000 Gaussian-random
>> numbers from a distribution with a standard deviation of 1.0. Then, look
>> for the maximum over all 1000 trials. Is that one > 3 sigma? It probably
>> is. If you do this "experiment" millions of times it turns out seeing at
>> least one 3-sigma deviate in 1000 tries is very common. Specifically,
>> about 93% of the time. It is rare indeed to have every member of a
>> 1000-deviate set all lie within 3 sigmas.  So, we have gone from one
>> 3-sigma deviate being highly unlikely to being a virtual certainty if
>> you look at enough samples.
>>
>> So, my question is: is a 3-sigma deviate significant?  Is it significant
>> only if you have one bond in the structure?  What about angles? What if
>> you have 500 bonds and 500 angles?  Do they count as 1000 deviates
>> together? Or separately?
>>
>> I'm sure the more mathematically inclined out there will have some
>> intelligent answers for the rest of us, however, if you are not a
>> mathematician, how about a vote?  Is a 3-sigma bond length deviation
>> significant? Or not?
>>
>> Looking forward to both kinds of responses,
>>
>> -James Holton
>> MAD Scientist
>>
>> ########################################################################
>>
>> To unsubscribe from the CCP4BB list, click the following link:
>> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
>>
>> This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a
>> mailing list hosted by www.jiscmail.ac.uk, terms & conditions are
>> available at https://www.jiscmail.ac.uk/policyandsecurity/
>>
>
>
> ------------------------------
>
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
>


-- 
------------------------------------------------------------------------------------------
P.H. Zwart
Staff Scientist, Molecular Biophysics and Integrated Bioimaging
Biosciences Lead, Center for Advanced Mathematics for Energy Research
Applications
Lawrence Berkeley National Laboratories
1 Cyclotron Road, Berkeley, CA-94703, USA
Cell: 510 289 9246
PHENIX:   http://www.phenix-online.org
CAMERA: http://camera.lbl.gov/
------------------------------------------------------------------------------------------

########################################################################

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/

Reply via email to