Vernon Webb wrote:
Yesterday someone asked if I used sa-learn and the response to myself was, I have something else to learn. Can someone explain to me how to use it?

If I understand correctly sa-learn can be used to train SA to recognize certain messages as SPAM or HAM. I've run the sa-learn command but it is not very clear as to how it is used. I mean I understand if I use "sa-learn --spam" I can train SA that something is SPAM but what, where? For instance today the thing is not "Effie Present" but rather "Happy NW Effie". So the efforts I took yesterday using the phish.ndb and scan.ndb database is still not cathcing these guys (however it is catching some Phishing scams).

I'm willing to try sa-learn, but what will that do for me? These guys are beginning to drive me nuts and obvioulsy I have something wrong as others are telling me these are being caught as SPAM on their systems.

Thanks


Tons here getting trapped with the "Happy NW (name)" spam:

X-Spam-Level: xxxxxxxxxxxxxxxxxxxx
X-Spam-Status: Hits:20.2 Learn:no Tests:BAYES_99,DATE_IN_PAST_03_06,
        
HELO_DYNAMIC_DHCP,HELO_DYNAMIC_IPADDR,RCVD_FORGED_WROTE,RCVD_IN_SORBS_DUL,
        SARE_LWSHORTT,SARE_MLB_Stock1,SARE_MLB_Stock2

auto_learn, auto_whitelist and auto_expire are on and I have set a more strict window for auto_learn and when bayes first kicks in:

  bayes_min_ham_num 500
  bayes_min_spam_num 500
  bayes_auto_learn_threshold_nonspam -0.15
  bayes_auto_learn_threshold_spam 15.0

Here is a dump of the bayes DB stats:

0.000    0          3    0  non-token data: bayes db version
0.000    0       9307    0  non-token data: nspam
0.000    0       2461    0  non-token data: nham
0.000    0     195899    0  non-token data: ntokens
0.000    0 1167342651    0  non-token data: oldest atime
0.000    0 1167411577    0  non-token data: newest atime
0.000    0 1167411583    0  non-token data: last journal sync atime
0.000    0 1167393605    0  non-token data: last expiry atime
0.000    0      50956    0  non-token data: last expire atime delta
0.000    0      86718    0  non-token data: last expire reduction count

I have had this going now for a few days. I deposit anything scoring over 25 in a special mailbox and then rejecting the message at SMTP. Anything scoring over 5 is then placed in individual account spambox mailboxes. I also am monitoring close what is getting rejected. So far nothing legit gets rejected. I have also not noticed any message that was a false positive/negative get auto_learned. Therefore, token data in the bayes DB has not been changed. That is good in a sense as you don't want false positive/negative messages getting auto_learned in one direction or another.

Maybe I'm wrong. It works, so I will continue to monitor and be happy.

Reply via email to