Neil wrote: > I'm wondering about the best way to train my Bayes filter (per-user > filtering). > > I have a Junk folder, and it contains roughly three categories of mail > (to my mind, at least): > A. Mail SpamAssassin marked spam and auto-learned as spam. > B. Mail SpamAssassin marked spam, but did not autolearn. > C. Mail SpamAssassin did not mark spam, which I moved in there. > > So my questions: > 1. Would it be bad for me to just run sa-learn on the entire Junk > folder; or should I just let auto-learn do it's thing and sa-learn the > false negatives? No. It's not bad.
If SA has already correctly learned the message, it will be skipped. Of course, this means it's a waste of time to feed SA messages it's already learned correctly, but it's not going to hurt anything. > > 2. Likewise, my Inbox contains just ham; could I run sa-learn on that > entire mailbox periodically? Ditto. > > 3. Lastly, will it be detrimental (in terms of future accuracy) to > sa-learn the same mail more than once, or will SpamAssassin remember > it? (I seem to remember reading the latter, but I wasn't sure). It will remember > If it does, how long/many previous mails does it remember? Currently the bayes_seen mechanism has no expiration, so it will remember forever, or until you manually delete bayes_seen.