On Wed, 29 Aug 2012 17:00:00 +0200 Mark Martinec wrote: > Rob,
> The main purpose of bayes_seen is to prevent a stream of same-contents > messages arriving in a short succession from polluting a bayes > database. I'd say that's more of a bug than a feature since you can only learn one spam out of that stream, and that may not be enough for BAYES_99. And it only happens when the sender fakes a received header otherwise the spams get separate entries. The main reason is so Bayes does the right thing without your having to keep track of each email's history, not just whether it need to be learned, but also whether it needs to be unlearned before relearning. On Wed, 29 Aug 2012 14:42:30 +0200 Rob Sterenborg wrote: > > > What bothers me is that I can't update the > > > spam_count and ham_count fields because AFAIK I don't have > > > information about that. > > > > You shoudn't normally touch those in an expire. > > Why not? Is it not related to the number of tokens that are in the > table? The the number of spam and ham mails learned are needed to compute per token probabilities for the tokens that are left after the expiry. The total number of tokens is not relevant to the calculation.