Vernon Webb wrote:
Yesterday someone asked if I used sa-learn and the response to myself was, I have
something else to learn. Can someone explain to me how to use it?
If I understand correctly sa-learn can be used to train SA to recognize certain
messages as SPAM or HAM. I've run the sa-learn command but it is not very clear as to
how it is used. I mean I understand if I use "sa-learn --spam" I can train SA that
something is SPAM but what, where? For instance today the thing is not "Effie
Present" but rather "Happy NW Effie". So the efforts I took yesterday using the
phish.ndb and scan.ndb database is still not cathcing these guys (however it is
catching some Phishing scams).
I'm willing to try sa-learn, but what will that do for me? These guys are beginning to
drive me nuts and obvioulsy I have something wrong as others are telling me these are
being caught as SPAM on their systems.
Thanks
Tons here getting trapped with the "Happy NW (name)" spam:
X-Spam-Level: xxxxxxxxxxxxxxxxxxxx
X-Spam-Status: Hits:20.2 Learn:no Tests:BAYES_99,DATE_IN_PAST_03_06,
HELO_DYNAMIC_DHCP,HELO_DYNAMIC_IPADDR,RCVD_FORGED_WROTE,RCVD_IN_SORBS_DUL,
SARE_LWSHORTT,SARE_MLB_Stock1,SARE_MLB_Stock2
auto_learn, auto_whitelist and auto_expire are on and I have set a more
strict window for auto_learn and when bayes first kicks in:
bayes_min_ham_num 500
bayes_min_spam_num 500
bayes_auto_learn_threshold_nonspam -0.15
bayes_auto_learn_threshold_spam 15.0
Here is a dump of the bayes DB stats:
0.000 0 3 0 non-token data: bayes db version
0.000 0 9307 0 non-token data: nspam
0.000 0 2461 0 non-token data: nham
0.000 0 195899 0 non-token data: ntokens
0.000 0 1167342651 0 non-token data: oldest atime
0.000 0 1167411577 0 non-token data: newest atime
0.000 0 1167411583 0 non-token data: last journal sync atime
0.000 0 1167393605 0 non-token data: last expiry atime
0.000 0 50956 0 non-token data: last expire atime delta
0.000 0 86718 0 non-token data: last expire reduction count
I have had this going now for a few days. I deposit anything scoring
over 25 in a special mailbox and then rejecting the message at SMTP.
Anything scoring over 5 is then placed in individual account spambox
mailboxes. I also am monitoring close what is getting rejected. So far
nothing legit gets rejected. I have also not noticed any message that
was a false positive/negative get auto_learned. Therefore, token data in
the bayes DB has not been changed. That is good in a sense as you don't
want false positive/negative messages getting auto_learned in one
direction or another.
Maybe I'm wrong. It works, so I will continue to monitor and be happy.