Re: bayesian filter training

Matt Kettler 10 Feb 2005 22:23:56 -0000

At 05:06 PM 2/10/2005, Matias Lopez Bergero wrote:

Just a question, It is worth to train the bayes filter with messages already detected and flagged as spam by spamassassin? That would do any good?


Yes. And even if they are already flagged as BAYES_99 it is still 
worthwhile.

The reason why is that bayes does not learn that a message is spam or not. Bayes learns that a given set of words and tokens were seen in spam. A given spam message might be scored as spam and might already score high on the bayes scale, but it can still contain valuable new words to learn from. In particular the constant mutations of ways of spelling drug names provides a constant stream of fresh new spam indicators to for bayes learn about. Learning about these helps it identify future spam messages that might not otherwise look very spam-like, and offers you some protection from false negatives caused by spam mutations.

The only time it's not worthwhile is if the message was already learned as spam (ie: by the autolearner).. but in that case SA will just ignore you. You're wasting some cpu time, but you won't damage or corrupt anything.

Re: bayesian filter training

Reply via email to