On 02/20/2014 06:44 PM, Amir Caspi wrote:
On Feb 20, 2014, at 10:34 AM, Axb <axb.li...@gmail.com> wrote:
I hope you're running SA 3.4 so:
I am still on 3.3.2 because nobody has yet packaged 3.4 for CentOS
5.x, from what I can tell. I have the package from the
rpmforge-extras repo, and 3.3.2 is still the most current version
there (and on Atomic and AtRPMs).
I'm not sure who is responsible for updating the packages, but I'll
probably have to wait a while until they get 3.4 uploaded there.
Assuming you can check maillogs and can either detect some spammed
unknown user patterns or have a dedicated trap domain to spare,
I'd accept that mail and write some header rules to score the trap
rcpt/domain REAL high and use a rule like
tflags RULENAME autolearn_force
I'm not entirely sure what you mean here. Are you saying to use a
honeypot/spamtrap to feed the Bayes DB?
yep, exactly.
My problem is not that my Bayes DB doesn't have enough spam in it, it's that
these particular
FNs are scoring 00. Let me note that the Bayes DBs are per-user, not
per-domain. Here's the magic output from my Bayes DB:
Personally I wouldn't use /user bayes DB but site wide so all users will
have the benefit of your trapped data/learnt spam
I'd bet you'd see a major improvement in spam detection and no FPs.
I don't think this counts as a "small" DB, does it?
well, not huge...let me brag :)
sa-learn --dump magic
0.000 0 3 0 non-token data: bayes db version
0.000 0 17663091 0 non-token data: nspam
0.000 0 6768342 0 non-token data: nham
Bayes is set to autolearn, and I manually run sa-learn about once a
week on my spam folder (to learn the FNs, plus lower-scoring spam
that was not autolearned). MANY such image spams are caught
properly, including by Bayes; the problem is that some of them,
somehow, manage to slip through and score very low (00 or 20). I
just have no idea how that is happening (which is why I should start
enabling token output in the headers and look), but that's why I was
thinking of scoring AC_SPAMMY_URI_PATTERNS very high if Bayes is
scoring very low, although I guess that kind of defeats the purpose
of Bayes and introduces the risk of FPs.
seems obvious that learning manually a week later isn't doing the trick
imo, you're in need of a better method to autolearn in "the flow"
as use an imap folder to drop FNs into and script learn spam from there,
every hour, for example...
Axb