Naming conventions for tests

Ben Kreunen Mon, 22 May 2006 17:07:45 -0700

Hi All

I've been approaching the problem of filtering spam at the email client end
using the SpamAssassin (3.x) header. Our email server (over which I have no
control) has a couple of server-side filters that reject emails with
infected attachments and messages with a spam score > 15. This leaves me
with about 100 spam messages per day.


Rather than rely on the numerical value of the X-Spam-Score header I've been
looking at client side filters using text strings to pick out groups of
SpammAssassin tests. Many tests that are similar in nature have common text
strings, allowing you to create a filter for a single term that includes a
wide number of tests. The effectiveness of this approach could be improved
with a better naming scheme for the tests.

The first filter I trialled picks up many tests for blacklisted domains/urls
using two text strings: 
X-Spam-Score contains "RCVD_IN" OR contains "BL_"

Unfortunately "RCVD_IN" also includes some "good" tests so I had to split
this into two filters:
X-Spam-Score contains "RCVD_IN" AND does not contain "_IADB_" AND does not
contain "_BSP_"
X-Spam-Score contains "BL_"

While these two filters do not cover all blacklist tests (and includes other
types of tests) they do pick up 90% of spam (for me), with numerical scores
down to 0.35.  The main problem with this approach is that it requires
monitoring of the SPAM assassin tests being applied as the software is
updated to ensure that it doesn't pick up additional tests for "good" email.
On the positive side, the "learning" aspect of this filter is done by the
various blacklists.

If the SpamAssassin test could be named with more consistent text strings it
would be simpler to set up client side filters. 
E.g. 
All tests for blacklists contain "_BL_"
All possible porn to start with "PORN_"

Cheers

Ben Kreunen

Imaging and IT Coordinator
Department of Pathology
The University of Melbourne

Naming conventions for tests

Reply via email to