Re: Bayes

tom Fri, 26 Jan 2007 03:49:19 -0800


On Jan 26, 2007, at 6:09 AM, Jack Gostl wrote:

The amount of spam getting through my filters has been steadilyincreasing. From a start of under two percent up to over tenpercent. It was getting pretty bad, so I finally, just on a hunch,I wiped my Bayes files and rebuilt them. And, voila!, I'm nowrunning under one percent.
Has anyone else seen this? Are there any suggestions as to how todeal with this? Should I regularly rebuild the bayes files?
Appreciate any advice.

Jack

I will attempt to answer your question from someone who has almostzero experience with SpamAssassin but years of experience withBayesian filters.

You should not have to regularly rebuild your files. This is kind ofcontrary to the whole notion of statistical filtering and if true mayindicate that SA has a problem with their approach.

However you may have an issue with how you conduct your training.
It is this area where I might have some useful information to share...

As I understand it, which means I could be wrong, SpamAssassin willlearn from past emails only when it's sufficiently good/bad andignore the grey areas in between.This means that your bayesian filter is only going to be able to pickup intelligence on the obvious spam and ignore all those in a greyarea (just barely spam).


Options are:

greater reinforcement of learning what is spam/ham through userfeedback. Take everything that is in this grey area and instruct SAon the good/bad status. This is really a refinement of the currentautolearn or "train on everything". But again, I could be wrongabout how SA deals with Bayesian learning.

"train on error" Once you get a sufficient database of tokens, onlytrain those that you specifically identify as an error, disabling theauto-learn aspect of SA. This keeps the database small and prevents(or minimizes) database poisoning.

"train to exhaustion" which means once you tag an email as incorrectyou keep training the database until the database can score itcorrectly. You might have to refeed an email many times into thedatabase. But I don't think SA will even allow you to do this.Other Bayesian classifiers will.

My personal experience with other Bayesian classifiers has been the'train on error" to be extremely effective over a long period of timewith a minimal impact on the database/performance of the applications.

Re: Bayes

Reply via email to