Re: Is the SA Bayes implementation mathematically sound?

Bill Cole Sun, 23 Dec 2018 12:50:02 -0800

On 22 Dec 2018, at 18:39, Damian wrote:

Hi all,


is there someone who has a good grasp around the mathematics of Bayes
learning with respect to SpamAssassin?

Justin Mason would be the best person to discuss this. I do not know ifhe still reads this list.

I assume that training a fresh BayesStore with a set of spam and ham
samples is mathematically sound.


Nope.

I mean, it probably is sound for the initial static set of spam and hamit is trained with, until more training and expiration happens. So what?It will NEVER be a mathematically sound Bayesian classifier for the mailit is asked to classify. Never.

It is imperfect for any ongoing collection of spam and ham. There is nosuch thing as a valid sample of email which applies to the spam/hamclassification of tomorrow's email. There are significant qualitativeand quantitative differences over time for any target and across targetsin any period of time. Exactly identical messages sent to multipleaddresses may be ham for one target and spam for another.

What bothers me a little is the
expiration logic.


Again the question is: so what?

As is shown almost every week on this list and almost every morning inthe update to the default rules channel, spam is a moving target. Asinvestment managers are required to say in the US: past performance isnot an indicator of future results.

The "Bayes" classifier SA is an empirically useful tool, not an academicproject. A better implementation might be one that conforms morerigorously to the underlying math, or it might not. A betterimplementation would do a better job classifying today's mail based onwhatever training it has and remembers than the existing implementation.

Re: Is the SA Bayes implementation mathematically sound?

Reply via email to