Re: sa-learn explained

Duane Hill Fri, 29 Dec 2006 09:12:11 -0800

Vernon Webb wrote:

Yesterday someone asked if I used sa-learn and the response to myself was, I havesomething else to learn. Can someone explain to me how to use it?
If I understand correctly sa-learn can be used to train SA to recognize certainmessages as SPAM or HAM. I've run the sa-learn command but it is not very clear as tohow it is used. I mean I understand if I use "sa-learn --spam" I can train SA thatsomething is SPAM but what, where? For instance today the thing is not "EffiePresent" but rather "Happy NW Effie". So the efforts I took yesterday using thephish.ndb and scan.ndb database is still not cathcing these guys (however it iscatching some Phishing scams).
I'm willing to try sa-learn, but what will that do for me? These guys are beginning todrive me nuts and obvioulsy I have something wrong as others are telling me these arebeing caught as SPAM on their systems.
Thanks


Tons here getting trapped with the "Happy NW (name)" spam:

X-Spam-Level: xxxxxxxxxxxxxxxxxxxx
X-Spam-Status: Hits:20.2 Learn:no Tests:BAYES_99,DATE_IN_PAST_03_06,
        
HELO_DYNAMIC_DHCP,HELO_DYNAMIC_IPADDR,RCVD_FORGED_WROTE,RCVD_IN_SORBS_DUL,
        SARE_LWSHORTT,SARE_MLB_Stock1,SARE_MLB_Stock2

auto_learn, auto_whitelist and auto_expire are on and I have set a morestrict window for auto_learn and when bayes first kicks in:


  bayes_min_ham_num 500
  bayes_min_spam_num 500
  bayes_auto_learn_threshold_nonspam -0.15
  bayes_auto_learn_threshold_spam 15.0

Here is a dump of the bayes DB stats:

0.000    0          3    0  non-token data: bayes db version
0.000    0       9307    0  non-token data: nspam
0.000    0       2461    0  non-token data: nham
0.000    0     195899    0  non-token data: ntokens
0.000    0 1167342651    0  non-token data: oldest atime
0.000    0 1167411577    0  non-token data: newest atime
0.000    0 1167411583    0  non-token data: last journal sync atime
0.000    0 1167393605    0  non-token data: last expiry atime
0.000    0      50956    0  non-token data: last expire atime delta
0.000    0      86718    0  non-token data: last expire reduction count

I have had this going now for a few days. I deposit anything scoringover 25 in a special mailbox and then rejecting the message at SMTP.Anything scoring over 5 is then placed in individual account spamboxmailboxes. I also am monitoring close what is getting rejected. So farnothing legit gets rejected. I have also not noticed any message thatwas a false positive/negative get auto_learned. Therefore, token data inthe bayes DB has not been changed. That is good in a sense as you don'twant false positive/negative messages getting auto_learned in onedirection or another.


Maybe I'm wrong. It works, so I will continue to monitor and be happy.

Re: sa-learn explained

Reply via email to