Re: When Bayes goes bad... How to fix?

Matt Kettler Sun, 12 Nov 2006 01:17:06 -0800

Bob Proulx wrote:
> I am still trying to figure out why Bayes is giving so many false
> positives.
>
> 0.000          0          3          0  non-token data: bayes db version
> 0.000          0     101467          0  non-token data: nspam
> 0.000          0      39694          0  non-token data: nham
> 0.000          0     181047          0  non-token data: ntokens
> 0.000          0 1163102355          0  non-token data: oldest atime
> 0.000          0 1163306671          0  non-token data: newest atime
> 0.000          0 1163306671          0  non-token data: last journal sync 
> atime
> 0.000          0 1163275571          0  non-token data: last expiry atime
> 0.000          0     172800          0  non-token data: last expire atime 
> delta
> 0.000          0      30379          0  non-token data: last expire reduction 
> count
>
> If I read that right the all of the tokens are from the 9th to the
> 11th.  Is that right? 
Dono, sounds about right.. my conversion of atimes sucks, but I can tell
you that the span in time from the oldest to the newest is only 2.34
days, which fits your date range.


>  In that case my suggestion to reduce the time
> is not going to help.  But then why has the Bayes locked on to so many
> bad tokens?  I wish there were some way to debug this.
>   
To start with,
Run some of the false messages through spamassassin -D bayes... Should
print out the tokens that match, in plaintext, and their probabilities.

That should at least let you know what it is your bayes DB has learned
that's bad.

If it's not too horible you might be able to use sa-learn --backup to
dump the DB, edit it by hand, and sa-learn --restore it.

However, you'd need to find the correct SHA1 of the offending tokens..
not sure if that will be in the debug output.

Re: When Bayes goes bad... How to fix?

Reply via email to