Re: Increase in Image Spam

Amir Caspi Thu, 20 Feb 2014 09:46:38 -0800

On Feb 20, 2014, at 10:34 AM, Axb <axb.li...@gmail.com> wrote:

> I hope you're running SA 3.4 so:


I am still on 3.3.2 because nobody has yet packaged 3.4 for CentOS 5.x, from 
what I can tell.  I have the package from the rpmforge-extras repo, and 3.3.2 
is still the most current version there (and on Atomic and AtRPMs).

I'm not sure who is responsible for updating the packages, but I'll probably 
have to wait a while until they get 3.4 uploaded there.

> Assuming you can check maillogs and can either detect some spammed unknown 
> user patterns or have  a dedicated trap domain to spare, I'd accept that mail 
> and write some header rules to score the trap rcpt/domain REAL high and use a 
> rule like
> 
> tflags RULENAME autolearn_force

I'm not entirely sure what you mean here.  Are you saying to use a 
honeypot/spamtrap to feed the Bayes DB?  My problem is not that my Bayes DB 
doesn't have enough spam in it, it's that these particular FNs are scoring 00.  
Let me note that the Bayes DBs are per-user, not per-domain.  Here's the magic 
output from my Bayes DB:

0.000          0          3          0  non-token data: bayes db version
0.000          0     239650          0  non-token data: nspam
0.000          0      85695          0  non-token data: nham
0.000          0     145773          0  non-token data: ntokens
0.000          0 1387110367          0  non-token data: oldest atime
0.000          0 1392917375          0  non-token data: newest atime
0.000          0 1392886526          0  non-token data: last journal sync atime
0.000          0 1392637273          0  non-token data: last expiry atime
0.000          0    5529600          0  non-token data: last expire atime delta
0.000          0       9005          0  non-token data: last expire reduction 
count

I don't think this counts as a "small" DB, does it?

Bayes is set to autolearn, and I manually run sa-learn about once a week on my 
spam folder (to learn the FNs, plus lower-scoring spam that was not 
autolearned).  MANY such image spams are caught properly, including by Bayes; 
the problem is that some of them, somehow, manage to slip through and score 
very low (00 or 20).  I just have no idea how that is happening (which is why I 
should start enabling token output in the headers and look), but that's why I 
was thinking of scoring AC_SPAMMY_URI_PATTERNS very high if Bayes is scoring 
very low, although I guess that kind of defeats the purpose of Bayes and 
introduces the risk of FPs.

-- Amir

Re: Increase in Image Spam

Reply via email to