Hi Jim,
Jim Maul wrote:
Paul Boven wrote:
Bayes is a very powerfull system, especially for recognising site-specific ham. But at this moment, apx. 30% of the spam that slips trough my filter has 'autolearn=ham' set. And another 60% of the spam slipping trough has a negative Bayes score to help them along. For the moment, I've disabled the autolearning in my Bayes system.
If your system is autolearning 30% of the spam as ham it is seriously screwed up.
No, fortunately that's not the case. Of all the spam that slips trough (which is still just below 1%), about a third doesn't only manage to slip trough, but even to get autolearned the wrong way.
It only autolearns when its pretty damn sure of its classification of the message in question. A bad bayes database will only continue to get worse if left alone. The trick is starting out good with the learning and its cake from there. On some systems its even less of an issue. I've maybe manually sa-learn'ed 20-30 messages ever in a little over a year using SA. Everything else has been autolearned. Its rare that i see bayes scores other than _00 and _99. I'd say my bayes db is pretty damn accurate at this point, and its done most of it on its own. Now keep in mind that i've altered the scores of some rules (bayes mostly) and i've also adjusted the autolearn thresholds for my system. I've upped the spam and lowered the ham numbers so nothing will be autolearned unless SA is REALLY sure it knows what its doing. I'd tend to think its easier to tweak the system a bit than to change the way bayes/autolearning works..but hey, thats just me.
Thanks for your response. What tresholds have you set for autolearning, and how exactly do you do your retraining? How many users does your SpamAsassin setup have?
Over here, the auto-learning treshholds are still at their default values (though I've disabled auto-learning for now), re-training is done by sending the offending message back to the filter in a Message/RFC822 attachement and there are about 90 users using the system. My Bayes database is in fairly good shape, but some kinds of spam have managed to get themselves a negative score.
Regards, Paul Boven.