Thank you for this.
Hmmm.
Interesting, and good to know the expected distribution of extreme values.
However, what I'm more worried about is how to evaluate the other 999
points? Lets say I'm trying to compare two 1000-member sets (A and B)
that both have an extreme value of 3, but for the other 999 they are all
2sigma in "A" and 1sigma in B. Clearly, "B" is better than "A", but how
to quantify?
On 11/8/2022 3:34 PM, Petrus Zwart wrote:
Hi James,
This is what you need.
https://en.wikipedia.org/wiki/Generalized_extreme_value_distribution
The distribution of a maximum of 1k random variates looks like this,
and the (fitted by eye) analytical distribution associated with it
seems to have a decent fit - as expected.
image.png
The idea of a p-value to judge the quality of a structure is
interesting. xtriage uses this mechanism to flag suspicious normalized
intensities, the idea being that in a small dataset it is less likely
to see a large E value as compared to in a large dataset.
The issue of course is that the total intensity of a normalized
intensity is bound by the number of atoms and the underlying
assumption used is that it can be potentially infinitely large. It
still is a decent metric I think.
P
P
On Tue, Nov 8, 2022 at 3:25 PM James Holton <jmhol...@lbl.gov> wrote:
Thank you Ian for your quick response!
I suppose what I'm really trying to do is put a p-value on the
"geometry" of a given PDB file. As in: what are the odds the
deviations from ideality of this model are due to chance?
I am leaning toward the need to take all the deviations in the
structure together as a set, but, as Joao just noted, that it just
"feels wrong" to tolerate a 3-sigma deviate. Even more wrong to
tolerate 4 sigma, 5 sigma. And 6 sigma deviates are really
difficult to swallow unless your have trillions of data points.
To put it down in equations, is the p-value of a structure with
1000 bonds in it with one 3-sigma deviate given by:
a) p = 1-erf(3/sqrt(2))
or
b) p = 1-erf(3/sqrt(2))**1000
or
c) something else?
On 11/8/2022 2:56 PM, Ian Tickle wrote:
Hi James
I don't think it's meaningful to ask whether the deviation of a
single bond length (or anything else that's single) from its
expected value is significant, since as you say there's always
some finite probability that it occurred purely by chance.
Statistics can only meaningfully be applied to samples of a
'reasonable' size. I know there are statistics designed for
small samples but not for samples of size 1 ! It's more
meaningful to talk about distributions. For example if 1% of the
sample contained deviations > 3 sigma when you expected there to
be only 0.3 %, that is probably significant (but it still has a
finite probability of occurring by chance), as would be finding
no deviations > 3 sigma (for a reasonably large sample to avoid
sampling errors).
Cheers
-- Ian
On Tue, Nov 8, 2022, 22:22 James Holton <jmhol...@lbl.gov> wrote:
OK, so lets suppose there is this bond in your structure that is
stretched a bit. Is that for real? Or just a random fluke?
Let's say
for example its a CA-CB bond that is supposed to be 1.529 A
long, but in
your model its 1.579 A. This is 0.05 A too long. Doesn't
seem like
much, right? But the "sigma" given to such a bond in our
geometry
libraries is 0.016 A. These sigmas are typically derived from a
database of observed bonds of similar type found in highly
accurate
structures, like small molecules. So, that makes this a
3-sigma outlier.
Assuming the distribution of deviations is Gaussian, that's a
pretty
unlikely thing to happen. You expect 3-sigma deviates to
appear less
than 0.3% of the time. So, is that significant?
But, then again, there are lots of other bonds in the
structure. Lets
say there are 1000. With that many samplings from a Gaussian
distribution you generally expect to see a 3-sigma deviate at
least
once. That is, do an "experiment" where you pick 1000
Gaussian-random
numbers from a distribution with a standard deviation of 1.0.
Then, look
for the maximum over all 1000 trials. Is that one > 3 sigma?
It probably
is. If you do this "experiment" millions of times it turns
out seeing at
least one 3-sigma deviate in 1000 tries is very common.
Specifically,
about 93% of the time. It is rare indeed to have every member
of a
1000-deviate set all lie within 3 sigmas. So, we have gone
from one
3-sigma deviate being highly unlikely to being a virtual
certainty if
you look at enough samples.
So, my question is: is a 3-sigma deviate significant? Is it
significant
only if you have one bond in the structure? What about
angles? What if
you have 500 bonds and 500 angles? Do they count as 1000
deviates
together? Or separately?
I'm sure the more mathematically inclined out there will have
some
intelligent answers for the rest of us, however, if you are
not a
mathematician, how about a vote? Is a 3-sigma bond length
deviation
significant? Or not?
Looking forward to both kinds of responses,
-James Holton
MAD Scientist
########################################################################
To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1>
This message was issued to members of
www.jiscmail.ac.uk/CCP4BB <http://www.jiscmail.ac.uk/CCP4BB>,
a mailing list hosted by www.jiscmail.ac.uk
<http://www.jiscmail.ac.uk>, terms & conditions are available
at https://www.jiscmail.ac.uk/policyandsecurity/
------------------------------------------------------------------------
To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1>
--
------------------------------------------------------------------------------------------
P.H. Zwart
Staff Scientist, Molecular Biophysics and Integrated Bioimaging
Biosciences Lead, Center for Advanced Mathematics for Energy Research
Applications
Lawrence Berkeley National Laboratories
1 Cyclotron Road, Berkeley, CA-94703, USA
Cell: 510 289 9246
PHENIX: http://www.phenix-online.org
CAMERA: http://camera.lbl.gov/
------------------------------------------------------------------------------------------
########################################################################
To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list
hosted by www.jiscmail.ac.uk, terms & conditions are available at
https://www.jiscmail.ac.uk/policyandsecurity/