I have a site-wide Bayesian database that I trained some time ago with a few
hundred hams, and then since then I've trained spam into it anytime I
received a false negative.
[...]
I noticed something interesting - all the spam I've gotten in at least the
last few days has scored 0 on Bayes.

I am continuously learning everything for Bayes. I have autolearn on (it's default) and am explicitly learning all unlearned ham and spam accordingly, included FPs. But that's only my account. The other users don't let learn their mail, so for them only autolearn applies.

Almost all spams that are half "content" and half random text score BAYES_99, so I think that's the way to do it. Whenever I look at the spam scores, I see BAYES_99 in spam and BAYES_50 or lower on ham. It's important to continuously learn everything so the system accommodates to new mail characteristics. No mail is more "important" to learn than others. Every mail is equally important.

To help Bayes distinguish between spam and ham, I have subscribed to a few technical medium-traffic spam free mailing lists, even if I don't read them regularly. Otherwise, the ham count is a bit too low in my opinion.

Reply via email to