On 1/15/2013 4:27 PM, Ben Johnson wrote:
On 1/15/2013 4:05 PM, Bowie Bailey wrote:
On 1/15/2013 3:47 PM, Ben Johnson wrote:
One final question on this subject (sorry...).

Is there value in training Bayes on messages that SA classified as spam
*due to other test scores*? In other words, if a message is classified
as SPAM due to a block-list test, but the message is new enough for
Bayes to assign a zero score, should that message be kept and fed to
sa-learn so that Bayes can soak-up all the tokens from a message that is
almost certainly spam (based on the other tests)?

Am I making any sense?
It is always worthwhile to train Bayes.  In an ideal world, you would
hand-sort and train every email that comes through your system.  The
more mail Bayes sees the more accurate it can be.

Thanks, Bowie. Given your response, would it then be prudent to call
"sa-learn --spam" on any message that *other tests* (non-Bayes tests)
determine to be spam (given some score threshold)?

That is exactly what the autolearn setting does. I let my system run with the default autolearn settings. Some people adjust the thresholds and some people prefer to turn off autolearn and do purely manual training.

The crux of my question/point is that I don't want to have to feed
messages that Bayes "misses" but that other tests identify *correctly*
as spam to "sa-learn --spam".

At one point, I had a script running on my server that looked for messages that were marked as spam with a low Bayes rating (BAYES_00 to BAYES_40) or messages marked as ham with a high Bayes rating (BAYES_60 to BAYES_99). I was then able to check the messages and learn them properly. This let me learn from the edge cases that were not being scored properly by Bayes while still making it to the correct folder due to other rules.

If you do this, you MUST check the messages yourself prior to learning since there is no other way to know whether they should be learned as ham or spam.

Is there value in implementing something like this? Or is there some
caveat that would make doing so self-defeating?

I find that Bayes autolearn works quite well for me, but others have had problems with it.

--
Bowie

Reply via email to