Re: Updated rules are not regarded

Adam Katz Tue, 25 May 2010 16:09:34 -0700

Please note that the ZMI German rules are very old, and while there
have been a few recent tweaks to the file, it doesn't look terribly
useful to any system that uses the Bayesian filter (more on this
later).  I would expect these rules to fire quite rarely, even in
environments that have lots of German-language mail.



Yves added ZMI via sa-update channels.  He confirmed its presence in
the correct area but wants to confirm it can run.

This command will tell you if SA is properly loading the configuration
file (this should note loading the ZMI rules):

  spamassassin --lint -D config 2>&1 |grep zmi_german

You can run lint without debug to see if SA takes issue with any of
the rules (no output means you're good):

  spamassassin --lint

Next, let's see if the rules are ever triggering.  This is merely a
question of filtering your logs (assuming SA is properly logged).

To do this, we'll first verify that there is the expected data your
logs and see how many messages SA scanned in this sampling period:

  zgrep -c 'spamd: result:' /var/log/mail.log*

Now let's look for rules from ZMI.  Since this rule set uses a common
prefix for all rules, this is an easy search:

  zgrep -c "spamd: result: .*ZMI" /var/log/mail.log*

I expect the results of the last two scans to be a very high number
for the total scanned message count and then a very low number (like
zero) for the ZMI-hitting message count.


For completeness, here's how to actually grab rules by name (in any
posix/bourne shell like bash but not like tcsh):

  RULES=`egrep '^ *score' 70_zmi_german.cf |awk '{printf $2"|"}'`

  zgrep -c "spamd: result: .*(${RULES%?})" /var/log/mail.log*


Finally, if you believe that the rules are being ignored, you can
compose a test to see if that is actually the case.  Take a *full*
sample spam and feed it into SA with a replaced subject as a test:

  formail -I "Subject: NLP Profis" < message.txt |spamassassin -t

You should see (among other things) a line noting that
ZMIde_SUBNLP_PROFI has been hit.


Stepping away from the ZMI issue and headig towards the larger
picture, what kind of spam are you trying to nail down with this
ruleset?  What goals did you hope to meet with the ZMI rules?  If it's
a specific type of spam, can you pastebin an example so we can help
you more directly?

Returning to my initial statement, I am under the impression that this
channel is useful only to victims of German spams who do not use
Bayes.  From a quick examination of the rules, it appears to be mostly
geared at SA implementations that cannot run Bayesian filters since
Bayes should be fully capable of grabbing ALL of those rules (possibly
excepting ZMISOBER_P_SPAM due to its examination of several non-word
elements) ... and Bayes should do a better job, too.

Are you using Bayes?  Are you training it?

Most people who want to improve their deployment's SA filters aren't
properly utilizing the various plugins.  Specifically, DNSBLs, URIBLs,
and Bayes, but also things like Razor2, DCC (if legal), and Pyzor.
Upgrading to SA 3.3.1 would be a big step up if you're not there
already (if you can't, you might want to consider a back-port of the
better DNSBLs to SA 3.2.x like my khop-bl channel).

Testing on a piece of spam:

  spamassassin -D < msg.txt > debug.txt 2>&1

Should reveal (among MANY other lines) output similar to this:

[5841] dbg: async: completed in 0.240 s: DNSBL-A,
dns:A:107.49.73.222.zen.spamhaus.org.

[5841] dbg: async: completed in 0.249 s: URI-DNSBL,
DNSBL:multi.uribl.com.:www.net.cn

[5841] dbg: bayes: score = 1

[5841] dbg: razor2: results: spam? 1

[5841] dbg: pyzor: got response: public.pyzor.org:24441 (200, 'OK') 4 0

[5841] dbg: dcc: dccifd got response: X-DCC-SIHOPE-DCC-3-Metrics:
guardian.ics.com 1085; Body=1 Fuz1=many Fuz2=many


This hit all those flags because I tested on a spam previously run
through 'spamassassin -r' (which teaches Bayes and reports to razor2
and others) ... you should still see results, even if they are ham.
The thing you want in this test is just successful connections to the
servers rather than the spam/ham results.

Re: Updated rules are not regarded

Reply via email to