Thank you Gergely,
Oh, don't worry, I am not concerned about belief. Neither the model nor
the data care what I believe.
What I am really asking is: what is the proper way to combine weak
observations?
Right now, in pretty much all structural sciences we are not used to
doing this, but we are entering an era where we will have to.
I was trying to ask a simple question with the 10x10 pixel patch because
(as Graeme, Ian and others pointed out) it highlights how the solution
must also apply to two patches of 50 pixels. In reality, unfortunately,
those two patches might not be next to each other and will have
different Lorentz factors, polarizaiton factors, absorption factors, and
probably different partiality as well. These values are knowable, but
they are not integers. The way we currently deal with all this is to
first convert patches of pixels into an expectation and variance, then
apply all the corrections, and finally "merge" everything with error
propagation into simple list of h,k,l,Iobs,sigIobs that we can compare
to a PDB file.
You are absolutely right that the best thing to do would be fitting a
model of the whole diffractometer and crystal, structure factors
included, directly and pixel-by-pixel to the image data. Some
colleagues and I managed to do this recently
(https://doi.org/10.1107/s2052252520013007). It is rather
computationally expensive, but seems to be working.
I hope this will be a useful tool, but I don't think such an approach
will ever completely supplant data reduction, as there are many
advantages to the latter. But only if you do the statistics right!
This is why I asked the community so that folks cleverer and more
experienced than I in such matters (such as yourself) can correct me if
I'm getting something wrong. And the community benefits from the
discussion.
Thank you for your thoughtful and thought-provoking insights!
-James Holton
MAD Scientist
On 10/19/2021 2:05 AM, Gergely Katona wrote:
Dear James,
I am sorry to nitpick, but this is the answer to "what is my belief of expectation
and variance if I observe a 10x10 patch of pixels with zero counts?" This will
heavily depend on my model.
When I make predictions like this, my intention is not to replace the data with a
"new and improved" data that is closer to the Truth and deposit in some
database from the position of authority.
I would simply use it to validate my model. Well, my model expects the Iobs to
be 0.01, but in fact it is 0. This may make me slightly worried, but then I
look at the posterior distribution and I see 0 with highest posterior
probability so I relax a bit that I do not have to throw out my model outright.
Still, a better model may be out there.
For a Bayesian the data is fixed and holy, the model may change. And the question rarely manifests
like that one does not have to spend a lot of time pondering about if a uniform distribution of the
rate is compatible with my belief in some quantum process. Bayesian folks are pragmatic. Your
question about "what is my belief about the slope and intercept of a line that is the basis of
some time-dependent random process given my observations" is more relevant. It is
straightforward to implement as a Bayesian network to answer this question and it will give you
predictions that looks deceptively like the data. Here, you only care about your prior belief about
the magnitude of slope and intercept, the belief about what the rate may be independent of time is
quite irrelevant and so are the predictions they may make. And I guess you would not intend to
deposit images that were generated by the predictions of these posterior models and the "new
and improved data".
Best wishes,
Gergely
Gergely Katona, Professor, Chairman of the Chemistry Program Council
Department of Chemistry and Molecular Biology, University of Gothenburg
Box 462, 40530 Göteborg, Sweden
Tel: +46-31-786-3959 / M: +46-70-912-3309 / Fax: +46-31-786-3910
Web: http://katonalab.eu, Email: gergely.kat...@gu.se
-----Original Message-----
From: CCP4 bulletin board <CCP4BB@JISCMAIL.AC.UK> On Behalf Of James Holton
Sent: 18 October, 2021 21:41
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] am I doing this right?
Thank you very much for this Kay!
So, to summarize, you are saying the answer to my question "what is the expectation
and variance if I observe a 10x10 patch of pixels with zero counts?" is:
Iobs = 0.01
sigIobs = 0.01 (defining sigIobs = sqrt(variance(Iobs)))
And for the one-pixel case:
Iobs = 1
sigIobs = 1
but in both cases the distribution is NOT Gaussian, but rather exponential. And
that means adding variances may not be the way to propagate error.
Is that right?
-James Holton
MAD Scientist
On 10/18/2021 7:00 AM, Kay Diederichs wrote:
Hi James,
I'm a bit behind ...
My answer about the basic question ("a patch of 100 pixels each with zero counts -
what is the variance?") you ask is the following:
1) we all know the Poisson PDF (Probability Distribution Function) P(k|l) =
l^k*e^(-l)/k! (where k stands for for an integer >=0 and l is lambda) which
tells us the probability of observing k counts if we know l. The PDF is
normalized: SUM_over_k (P(k|l)) is 1 when k=0...infinity is 1.
2) you don't know before the experiment what l is, and you assume it is some number x
with 0<=x<=xmax (the xmax limit can be calculated by looking at the physics of
the experiment; it is finite and less than the overload value of the pixel, otherwise
you should do a different experiment). Since you don't know that number, all the x
values are equally likely - you use a uniform prior.
3) what is the PDF P(l|k) of l if we observe k counts? That can be found with Bayes
theorem, and it turns out that (due to the uniform prior) the right hand side of the
formula looks the same as in 1) : P(l|k) = l^k*e^(-l)/k! (again, the ! stands for the
factorial, it is not a semantic exclamation mark). This is eqs. 7.42 and 7.43 in Agostini
"Bayesian Reasoning in Data Analysis".
3a) side note: if we calculate the expectation value for l, by
multiplying with l and integrating over l from 0 to infinity, we
obtain E(P(l|k))=k+1, and similarly for the variance (Agostini eqs
7.45 and 7.46)
4) for k=0 (zero counts observed in a single pixel), this reduces to
P(l|0)=e^(-l) for a single observation (pixel). (this is basic math; see also
§7.4.1 of Agostini.
5) since we have 100 independent pixels, we must multiply the individual PDFs
to get the overall PDF f, and also normalize to make the integral over that PDF
to be 1: the result is f(l|all 100 pixels are 0)=n*e^(-n*l). (basic math). A
more Bayesian procedure would be to realize that the posterior PDF
P(l|0)=e^(-l) of the first pixel should be used as the prior for the second
pixel, and so forth until the 100th pixel. This has the same result f(l|all 100
pixels are 0)=n*e^(-n*l) (Agostini § 7.7.2)!
6) the expectation value INTEGRAL_0_to_infinity over l*n*e^(-n*l) dl is 1/n .
This is 1 if n=1 as we know from 3a), and 1/100 for 100 pixels with 0 counts.
7) the variance is then INTEGRAL_0_to_infinity over
(l-1/n)^2*n*e^(-n*l) dl . This is 1/n^2
I find these results quite satisfactory. Please note that they deviate from the
MLE result: expectation value=0, variance=0 . The problem appears to be that a
Maximum Likelihood Estimator may give wrong results for small n; something that
I've read a couple of times but which appears not to be universally
known/taught. Clearly, the result in 6) and 7) for large n converges towards 0,
as it should be.
What this also means is that one should really work out the PDF instead of just
adding expectation values and variances (and arriving at 100 if all 100 pixels
have zero counts) because it is contradictory to use a uniform prior for all
the pixels if OTOH these agree perfectly in being 0!
What this means for zero-dose extrapolation I have not thought about. At least
it prevents infinite weights!
Best,
Kay
########################################################################
To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list
hosted by www.jiscmail.ac.uk, terms & conditions are available at
https://www.jiscmail.ac.uk/policyandsecurity/
########################################################################
To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list
hosted by www.jiscmail.ac.uk, terms & conditions are available at
https://www.jiscmail.ac.uk/policyandsecurity/