Re: Calling spamassassin directly yields very different results than calling spamassassin via amavis-new

Bowie Bailey Tue, 15 Jan 2013 13:40:09 -0800

On 1/15/2013 4:27 PM, Ben Johnson wrote:

On 1/15/2013 4:05 PM, Bowie Bailey wrote:

On 1/15/2013 3:47 PM, Ben Johnson wrote:

One final question on this subject (sorry...).


Is there value in training Bayes on messages that SA classified as spam
*due to other test scores*? In other words, if a message is classified
as SPAM due to a block-list test, but the message is new enough for
Bayes to assign a zero score, should that message be kept and fed to
sa-learn so that Bayes can soak-up all the tokens from a message that is
almost certainly spam (based on the other tests)?

Am I making any sense?

It is always worthwhile to train Bayes.  In an ideal world, you would
hand-sort and train every email that comes through your system.  The
more mail Bayes sees the more accurate it can be.

Thanks, Bowie. Given your response, would it then be prudent to call
"sa-learn --spam" on any message that *other tests* (non-Bayes tests)
determine to be spam (given some score threshold)?

That is exactly what the autolearn setting does. I let my system runwith the default autolearn settings. Some people adjust the thresholdsand some people prefer to turn off autolearn and do purely manual training.

The crux of my question/point is that I don't want to have to feed
messages that Bayes "misses" but that other tests identify *correctly*
as spam to "sa-learn --spam".

At one point, I had a script running on my server that looked formessages that were marked as spam with a low Bayes rating (BAYES_00 toBAYES_40) or messages marked as ham with a high Bayes rating (BAYES_60to BAYES_99). I was then able to check the messages and learn themproperly. This let me learn from the edge cases that were not beingscored properly by Bayes while still making it to the correct folder dueto other rules.

If you do this, you MUST check the messages yourself prior to learningsince there is no other way to know whether they should be learned asham or spam.

Is there value in implementing something like this? Or is there some
caveat that would make doing so self-defeating?

I find that Bayes autolearn works quite well for me, but others have hadproblems with it.


--
Bowie

Re: Calling spamassassin directly yields very different results than calling spamassassin via amavis-new

Reply via email to