RE: About zero counts etc.

David, WIF \(Bill\) Fri, 13 Oct 2006 15:23:54 -0700

Title: RE: About zero counts etc.

The short answer is that the question of zero counts can indeed be answered in a simple and practical manner. There's a lot of apparent mystique about probability theory especially when one invokes the term Bayesian probability theory. However, Bayesian probability theory is really just common sense about what's likely and what's unlikely converted into mathematics. It may seem tautologous but the most probable answer is the answer with the highest probability value – i.e. the maximum of the probability distribution function (pdf). This is quite obvious – the subtle bit is “what is the probability distribution function”. Normally with enough counts, we move over to a Gaussian pdf along the lines of

Gaussian pdf = const * exp(-0.5*(yobs-ycalc)/esd^2)

This is a maximum when the negative log pdf is a minimum. The negative log pdf is simply

negative log pdf = const2 * (yobs-ycalc)/esd^2

In other words, least squares. The corollary is that least squares is associated with (and only associated with) a Gaussian pdf which happily is the case almost all of the time in Rietveld analysis. However, when there are zeroes and ones around in the observed counts or when the model is incomplete or uncertain then we have to move away from a Gaussian pdf and by implication least squares.

With zeroes and ones around, we have to move from a Gaussian pdf to a Poisson pdf and work with the negative log pdf of that. With incomplete models, we have to move over to the maximum likelihood methods that the macromolecular people have been using for years.

So the bottom line is that fundamental statistics is as basic as fundamental parameters. You can fudge your way with a Pearson VII or a pseudo-Voigt function to fit an X-ray emission profile but the physics says that there are better functions. Similarly you can fudge your way with tweaking the Gaussian pdf when there are zero and one counts around but you’d be better to move over to the correct probability functions.

Antoniadis et al. have been through all of this back in the early 90s. I’ve got code (and it’s only a few lines) that is precisely Poisson and moves seamlessly over to the chi-squared metric. I’ll send it to you offline – it would be great to have the ability to program our own algebraic minimisation function in a TOPAS input file. I’d love to be able to do that for robust statistics and also maximum likelihood!

All the best,

Bill

-----Original Message-----
From: AlanCoelho [mailto:[EMAIL PROTECTED]
Sent: 13 October 2006 22:30
To: rietveld_l@ill.fr
Subject: RE: About zero counts etc.

Hi Bill and others

Why cant this question of zero counts be answered in a simple and practical

manner. I hope to read Devinder Sivia's excellent book one day but for the

time being it would be useful if the statistical heavy weights were to

advise on what weighting is appropriate without everyone havng to understand

the details.

The original question was how to get XFIT to load data files with zero

counts; obviously setting the values to 1 is incorrect. Joerg Bergmann seems

to indicate that the weihgting should be:

weighting = 1 / (Yobs+1);

as the esd is sqrt(n+1). Again without reading the book should the weighting

be:

weighting = 1 / If(Yobs, Yobs, Ycalc);

Hopefully these spread sheet type formulas are understandable. This last

equation is not liked by computers due to a possible zero divide when Ycalc

is zero.

Any ideas Bill and others

Alan

________________________________

From: David, WIF (Bill) [mailto:[EMAIL PROTECTED]

Sent: Thursday, 12 October 2006 4:48 PM

To: rietveld_l@ill.fr

Subject: RE: About zero counts etc.

Dear all,

Jon's right - when the counts are very low - i.e. zeroes and ones around -

then the correct Bayesian approach is to use Poisson statistics. This, as

Jon said, has been tackled by Antoniadis et al. (Acta Cryst. (1990). A46,

692-711 Maximum-likelihood methods in powder diffraction refinements, A.

Antoniadis, J. Berruyer and A. Filhol) in the context of the Rietveld method

some years ago. This paper is very informative for those who are intrigued

about the fact that you can do anything when diffraction patterns have lots

of zeroes and ones around. Curiously, the weighting ends up having as much

to do with the model value (which can, of course, be non-integer) as the

data. Devinder Sivia's excellent OUP monograph, "Data Analysis: a Bayesian

Tutorial" (http://www.oup.co.uk/isbn/0-19-856832-0) discusses all of this in

a very readable way.

Bill

-----Original Message-----

From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]

Sent: 12 October 2006 00:37

To: rietveld_l@ill.fr

Subject: Re: About zero counts etc.

Hello Joerg,

> -Having measured n counts, the estimated value is n+1

You might have a hard time convincing me on that one.

> -Having measured n counts, the esd is also sqrt(n+1)!

If n is zero then spending more time on the data collection might be better

than

more time on the analysis.

> Things change with variable counting times.

id31sum uses counts=counts and esd=sqrt(counts+alp) where alp=0.5 is the

default

and can be overridden on the command line. Perhaps there aren't many people

who

use that option. Should we change the default? The 0.5 came from the

literature

but it was some time ago and I can't remember where. In any case it then

gets

convoluted with the monitor error. Sqrt(n+1) gives a very low chi^2 if the

actual background is 0.1 (eg: 1 count every 10 datapoints). Might be better

just use the Poisson itself, as in abfit [1].

> the above correction for the estimated

> values gave significant better R values.

Are you using background subtracted R-values? If only R-values were

significant.

Jon

[1] Acta Cryst. (1990). A46, 692-711

Maximum-likelihood methods in powder diffraction refinements

A. Antoniadis, J. Berruyer and A. Filhol

-------------------------------------------------

This mail sent through IMP: http://horde.org/imp/

RE: About zero counts etc.

Reply via email to