Re: sa-learn explained

Jim Maul Fri, 29 Dec 2006 09:28:47 -0800

Dave Koontz wrote:

I guess milage varies. Auto-Learn has been a life saver for us and has

drastically reduced false postives we used to get with emails to our
College's Health Care & Research departments.  We pass all local user email
through SA as well, so this really helps the system learn what is 'good'
email.


I'd suggest that everyone should at least try it and monitor the results.

I have found autolearn to be quite a valuable function here as well.Keep in mind that i have adjusted the autolearn threshold values toprevent things from being autolearned incorrectly. I would suggestothers do the same if they use autolearn. IMO, with the default scores,it is too easy for false learning to occur. I use:


bayes_auto_learn_threshold_nonspam -0.1
bayes_auto_learn_threshold_spam 10.0

-Jim

-----Original Message-----
From: Nigel Frankcom [mailto:[EMAIL PROTECTED]Sent: Friday, December 29, 2006 11:17 AM
To: users@spamassassin.apache.org
Subject: Re: sa-learn explained

On Fri, 29 Dec 2006 09:51:05 -0500, Andy Figueroa
<[EMAIL PROTECTED]> wrote:
I still fee like a tyro with SpamAssassin, but my installation iscatching better than 99% with perhaps 0.1% false positives (thanks inlarge part to things I've learned from this list), and I think I cantell you a couple of things better than just read the manual. (But, doread the manual!) My initial experience with SpamAssassin about a yearago was through a large web hosting company and I was limited toplaying with SpamAssassin through cpanel, though till they movedSpamAssassin to its own server, I could also edit my own userpreferences directly. The problem was, this big company never couldget it right, so now I'm running my own mailserver(s) out of whatseemed like necessity. I'm running Gentoo with SA 3.1.7.
sa-learn is used to train and keep up-to-date the bayesian database.So, turn on autolearn in your /etc/mail/spamassassin/local.cf so theline reads:
bayes_auto_learn 1
(should be on by default).
This will cause selected spam and ham that you get to be usedautomagically to keep the bayesian database up-to-date.
I'm using maildir and have two subdirectories in my .maildir called:
2-learn-spam
2-learn-ham
I put missed spam in 2-learn-spam and ham misclassified as ham in2-learn-ham. Then, whenever I have a few messages in one of thosedirectories, I run one of the following scripts:
learnspam.scr, which contains this line:
sa-learn --spam --progress /home/figueroa/.maildir/.2-learn-spam/cur

learnham.scr which contains this line:
sa-learn --ham --progress /home/figueroa/.maildir/.2-learn-ham/cur
This is on my personal mailserver. On the mailserver I run at aschool, I run that script on each users 2-learn-spam/ham directoriesevery night under crontab.
Run an up-to-date version of SpmaAsssasin. I was having pretty goodresults with 3.1.3 (the unmasked version in Gentoo), but gotimmediately better results when I upgraded to the current version.
Also, to keep your RULES up-to-date, run sa-update as root fromtime-to-time.
Good luck!  Happy spamassassaning!
Personally, I'd disagree with auto-learn; having used SA in a production
environment for some years I've found manual training to be a better
solution.

YMMV

Just my 2 (pick your currency) worth.

Nigel

Re: sa-learn explained

Reply via email to