On Wed, 29 Aug 2012 17:00:00 +0200
Mark Martinec wrote:

> Rob,

> The main purpose of bayes_seen is to prevent a stream of same-contents
> messages arriving in a short succession from polluting a bayes
> database.

I'd say that's more of a bug than a feature since you can only learn
one spam out of that stream, and that may not be enough for BAYES_99.
And it only happens when the sender fakes a received header otherwise
the spams get separate entries. 

The main reason is so Bayes does the right thing without your having to
keep track of each email's history, not just whether it need to be
learned, but also whether it needs to be unlearned before relearning.


On Wed, 29 Aug 2012 14:42:30 +0200
Rob Sterenborg wrote:

> > > What bothers me is that I can't update the
> > > spam_count and ham_count fields because AFAIK I don't have
> > > information about that.
> > 
> > You shoudn't normally touch those in an expire.
> 
> Why not? Is it not related to the number of tokens that are in the
> table?

The the number of spam and ham mails learned are needed to compute per
token probabilities for the tokens that are left after the expiry. The
total number of tokens is not relevant to the calculation.

Reply via email to