Re: [ccp4bb] am I doing this right?

James Holton Tue, 19 Oct 2021 08:10:34 -0700

Thank you Gergely,

Oh, don't worry, I am not concerned about belief. Neither the model northe data care what I believe.

What I am really asking is: what is the proper way to combine weakobservations?

Right now, in pretty much all structural sciences we are not used todoing this, but we are entering an era where we will have to.

I was trying to ask a simple question with the 10x10 pixel patch because(as Graeme, Ian and others pointed out) it highlights how the solutionmust also apply to two patches of 50 pixels. In reality, unfortunately,those two patches might not be next to each other and will havedifferent Lorentz factors, polarizaiton factors, absorption factors, andprobably different partiality as well. These values are knowable, butthey are not integers. The way we currently deal with all this is tofirst convert patches of pixels into an expectation and variance, thenapply all the corrections, and finally "merge" everything with errorpropagation into simple list of h,k,l,Iobs,sigIobs that we can compareto a PDB file.

You are absolutely right that the best thing to do would be fitting amodel of the whole diffractometer and crystal, structure factorsincluded, directly and pixel-by-pixel to the image data. Somecolleagues and I managed to do this recently(https://doi.org/10.1107/s2052252520013007). It is rathercomputationally expensive, but seems to be working.

I hope this will be a useful tool, but I don't think such an approachwill ever completely supplant data reduction, as there are manyadvantages to the latter. But only if you do the statistics right! This is why I asked the community so that folks cleverer and moreexperienced than I in such matters (such as yourself) can correct me ifI'm getting something wrong. And the community benefits from thediscussion.


Thank you for your thoughtful and thought-provoking insights!

-James Holton
MAD Scientist


On 10/19/2021 2:05 AM, Gergely Katona wrote:

Dear James,

I am sorry to nitpick, but this is the answer to "what is my belief of expectation 
and variance if I observe a 10x10 patch of pixels with zero counts?" This will 
heavily depend on my model.
When I make predictions like this, my intention is not to replace the data with a 
"new and improved" data that is closer to the Truth and deposit in some 
database from the position of authority.

I would simply use it to validate my model. Well, my model expects the Iobs to 
be 0.01, but in fact it is 0. This may make me slightly worried, but then I 
look at the posterior distribution and I see 0 with highest posterior 
probability so I relax a bit that I do not have to throw out my model outright. 
Still, a better model may be out there.
For a Bayesian the data is fixed and holy, the model may change. And the question rarely manifests 
like that one does not have to spend a lot of time pondering about if a uniform distribution of the 
rate is compatible with my belief in some quantum process. Bayesian folks are pragmatic. Your 
question about "what is my belief about the slope and intercept of a line that is the basis of 
some time-dependent random process given my observations" is more relevant. It is 
straightforward to implement as a Bayesian network to answer this question and it will give you 
predictions that looks deceptively like the data. Here, you only care about your prior belief about 
the magnitude of slope and intercept, the belief about what the rate may be independent of time is 
quite irrelevant and so are the predictions they may make. And I guess you would not intend to 
deposit images that were generated by the predictions of these posterior models and the "new 
and improved data".

Best wishes,

Gergely


Gergely Katona, Professor, Chairman of the Chemistry Program Council
Department of Chemistry and Molecular Biology, University of Gothenburg
Box 462, 40530 Göteborg, Sweden
Tel: +46-31-786-3959 / M: +46-70-912-3309 / Fax: +46-31-786-3910
Web: http://katonalab.eu, Email: gergely.kat...@gu.se

-----Original Message-----
From: CCP4 bulletin board <CCP4BB@JISCMAIL.AC.UK> On Behalf Of James Holton
Sent: 18 October, 2021 21:41
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] am I doing this right?

Thank you very much for this Kay!

So, to summarize, you are saying the answer to my question "what is the expectation 
and variance if I observe a 10x10 patch of pixels with zero counts?" is:
Iobs = 0.01
sigIobs = 0.01     (defining sigIobs = sqrt(variance(Iobs)))

And for the one-pixel case:
Iobs = 1
sigIobs = 1

but in both cases the distribution is NOT Gaussian, but rather exponential. And 
that means adding variances may not be the way to propagate error.

Is that right?

-James Holton
MAD Scientist



On 10/18/2021 7:00 AM, Kay Diederichs wrote:

Hi James,

I'm a bit behind ...

My answer about the basic question ("a patch of 100 pixels each with zero counts - 
what is the variance?") you ask is the following:

1) we all know the Poisson PDF (Probability Distribution Function)  P(k|l) = 
l^k*e^(-l)/k!  (where k stands for for an integer >=0 and l is lambda) which 
tells us the probability of observing k counts if we know l. The PDF is 
normalized: SUM_over_k (P(k|l)) is 1 when k=0...infinity is 1.
2) you don't know before the experiment what l is, and you assume it is some number x 
with 0<=x<=xmax (the xmax limit can be calculated by looking at the physics of 
the experiment; it is finite and less than the overload value of the pixel, otherwise 
you should do a different experiment). Since you don't know that number, all the x 
values are equally likely - you use a uniform prior.
3) what is the PDF P(l|k) of l if we observe k counts?  That can be found with Bayes 
theorem, and it turns out that (due to the uniform prior) the right hand side of the 
formula looks the same as in 1) : P(l|k) = l^k*e^(-l)/k! (again, the ! stands for the 
factorial, it is not a semantic exclamation mark). This is eqs. 7.42 and 7.43 in Agostini 
"Bayesian Reasoning in Data Analysis".
3a) side note: if we calculate the expectation value for l, by
multiplying with l and integrating over l from 0 to infinity, we
obtain E(P(l|k))=k+1, and similarly for the variance (Agostini eqs
7.45 and 7.46)
4) for k=0 (zero counts observed in a single pixel), this reduces to 
P(l|0)=e^(-l) for a single observation (pixel). (this is basic math; see also 
§7.4.1 of Agostini.
5) since we have 100 independent pixels, we must multiply the individual PDFs 
to get the overall PDF f, and also normalize to make the integral over that PDF 
to be 1: the result is f(l|all 100 pixels are 0)=n*e^(-n*l). (basic math). A 
more Bayesian procedure would be to realize that the posterior PDF 
P(l|0)=e^(-l) of the first pixel should be used as the prior for the second 
pixel, and so forth until the 100th pixel. This has the same result f(l|all 100 
pixels are 0)=n*e^(-n*l) (Agostini § 7.7.2)!
6) the expectation value INTEGRAL_0_to_infinity over l*n*e^(-n*l) dl is 1/n .  
This is 1 if n=1 as we know from 3a), and 1/100 for 100 pixels with 0 counts.
7) the variance is then INTEGRAL_0_to_infinity over
(l-1/n)^2*n*e^(-n*l) dl . This is 1/n^2

I find these results quite satisfactory. Please note that they deviate from the 
MLE result: expectation value=0, variance=0 . The problem appears to be that a 
Maximum Likelihood Estimator may give wrong results for small n; something that 
I've read a couple of times but which appears not to be universally 
known/taught. Clearly, the result in 6) and 7) for large n converges towards 0, 
as it should be.
What this also means is that one should really work out the PDF instead of just 
adding expectation values and variances (and arriving at 100 if all 100 pixels 
have zero counts) because it is contradictory to use a uniform prior for all 
the pixels if OTOH these agree perfectly in being 0!

What this means for zero-dose extrapolation I have not thought about. At least 
it prevents infinite weights!

Best,
Kay

########################################################################

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


########################################################################

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/

Re: [ccp4bb] am I doing this right?

Reply via email to