The short answer is that the question of zero counts can indeed be answered in a simple and practical manner. There's a lot of apparent mystique about probability theory especially when one invokes the term Bayesian probability theory. However, Bayesian probability theory is really just common sense about what's likely and what's unlikely converted into mathematics. It may seem tautologous but the most probable answer is the answer with the highest probability value – i.e. the maximum of the probability distribution function (pdf). This is quite obvious – the subtle bit is “what is the probability distribution function”. Normally with enough counts, we move over to a Gaussian pdf along the lines of
Gaussian pdf = const * exp(-0.5*(yobs-ycalc)/esd^2)
This is a maximum when the negative log pdf is a minimum. The negative log pdf is simply
negative log pdf = const2 * (yobs-ycalc)/esd^2
In other words, least squares. The corollary is that least squares is associated with (and only associated with) a Gaussian pdf which happily is the case almost all of the time in Rietveld analysis. However, when there are zeroes and ones around in the observed counts or when the model is incomplete or uncertain then we have to move away from a Gaussian pdf and by implication least squares.
With zeroes and ones around, we have to move from a Gaussian pdf to a Poisson pdf and work with the negative log pdf of that. With incomplete models, we have to move over to the maximum likelihood methods that the macromolecular people have been using for years.
So the bottom line is that fundamental statistics is as basic as fundamental parameters. You can fudge your way with a Pearson VII or a pseudo-Voigt function to fit an X-ray emission profile but the physics says that there are better functions. Similarly you can fudge your way with tweaking the Gaussian pdf when there are zero and one counts around but you’d be better to move over to the correct probability functions.
Antoniadis et al. have been through all of this back in the early 90s. I’ve got code (and it’s only a few lines) that is precisely Poisson and moves seamlessly over to the chi-squared metric. I’ll send it to you offline – it would be great to have the ability to program our own algebraic minimisation function in a TOPAS input file. I’d love to be able to do that for robust statistics and also maximum likelihood!
All the best,
Bill
-----Original Message-----
From: AlanCoelho [mailto:[EMAIL PROTECTED]
Sent: 13 October 2006 22:30
To: rietveld_l@ill.fr
Subject: RE: About zero counts etc.
Hi Bill and others
Why cant this question of zero counts be answered in a simple and practical
manner. I hope to read Devinder Sivia's excellent book one day but for the
time being it would be useful if the statistical heavy weights were to
advise on what weighting is appropriate without everyone havng to understand
the details.
The original question was how to get XFIT to load data files with zero
counts; obviously setting the values to 1 is incorrect. Joerg Bergmann seems
to indicate that the weihgting should be:
weighting = 1 / (Yobs+1);
as the esd is sqrt(n+1). Again without reading the book should the weighting
be:
weighting = 1 / If(Yobs, Yobs, Ycalc);
Hopefully these spread sheet type formulas are understandable. This last
equation is not liked by computers due to a possible zero divide when Ycalc
is zero.
Any ideas Bill and others
Alan
________________________________
From: David, WIF (Bill) [mailto:[EMAIL PROTECTED]
Sent: Thursday, 12 October 2006 4:48 PM
To: rietveld_l@ill.fr
Subject: RE: About zero counts etc.
Dear all,
Jon's right - when the counts are very low - i.e. zeroes and ones around -
then the correct Bayesian approach is to use Poisson statistics. This, as
Jon said, has been tackled by Antoniadis et al. (Acta Cryst. (1990). A46,
692-711 Maximum-likelihood methods in powder diffraction refinements, A.
Antoniadis, J. Berruyer and A. Filhol) in the context of the Rietveld method
some years ago. This paper is very informative for those who are intrigued
about the fact that you can do anything when diffraction patterns have lots
of zeroes and ones around. Curiously, the weighting ends up having as much
to do with the model value (which can, of course, be non-integer) as the
data. Devinder Sivia's excellent OUP monograph, "Data Analysis: a Bayesian
Tutorial" (http://www.oup.co.uk/isbn/0-19-856832-0) discusses all of this in
a very readable way.
Bill
-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
Sent: 12 October 2006 00:37
To: rietveld_l@ill.fr
Subject: Re: About zero counts etc.
Hello Joerg,
> -Having measured n counts, the estimated value is n+1
You might have a hard time convincing me on that one.
> -Having measured n counts, the esd is also sqrt(n+1)!
If n is zero then spending more time on the data collection might be better
than
more time on the analysis.
> Things change with variable counting times.
id31sum uses counts=counts and esd=sqrt(counts+alp) where alp=0.5 is the
default
and can be overridden on the command line. Perhaps there aren't many people
who
use that option. Should we change the default? The 0.5 came from the
literature
but it was some time ago and I can't remember where. In any case it then
gets
convoluted with the monitor error. Sqrt(n+1) gives a very low chi^2 if the
actual background is 0.1 (eg: 1 count every 10 datapoints). Might be better
to
just use the Poisson itself, as in abfit [1].
> the above correction for the estimated
> values gave significant better R values.
Are you using background subtracted R-values? If only R-values were
significant.
Jon
[1] Acta Cryst. (1990). A46, 692-711
Maximum-likelihood methods in powder diffraction refinements
A. Antoniadis, J. Berruyer and A. Filhol
-------------------------------------------------
This mail sent through IMP: http://horde.org/imp/