Re: Bayes Strategies

Neil Fri, 07 Nov 2008 20:44:42 -0800


On 7 Nov 2008, at 23:40, Matt Kettler wrote:

Neil wrote:

I'm wondering about the best way to train my Bayes filter (per-user
filtering).

I have a Junk folder, and it contains roughly three categories ofmail

(to my mind, at least):
A. Mail SpamAssassin marked spam and auto-learned as spam.
B. Mail SpamAssassin marked spam, but did not autolearn.
C. Mail SpamAssassin did not mark spam, which I moved in there.

So my questions:
1. Would it be bad for me to just run sa-learn on the entire Junk

folder; or should I just let auto-learn do it's thing and sa-learnthe

false negatives?

No. It's not bad.

If SA has already correctly learned the message, it will be skipped.Ofcourse, this means it's a waste of time to feed SA messages it'salready

learned correctly, but it's not going to hurt anything.


2. Likewise, my Inbox contains just ham; could I run sa-learn on that
entire mailbox periodically?

Ditto.


3. Lastly, will it be detrimental (in terms of future accuracy) to
sa-learn the same mail more than once, or will SpamAssassin remember
it?  (I seem to remember reading the latter, but I wasn't sure).

It will remember

If it does, how long/many previous mails does it remember?

Currently the bayes_seen mechanism has no expiration, so it will
remember forever, or until you manually delete bayes_seen.



Thanks.

So then I think my strategy is going to be: sort the mail as usual,and then every once in a while log into my server and run a scriptwhich will call sa-learn on both mailboxes.

-N.

Re: Bayes Strategies

Reply via email to