On Thu, 2014-08-28 at 17:42 -0500, Robert A. Ober wrote:
> If you find a solution without my having to write more rules I would 
> appreciate you letting the list know.
> 
This may not be what you want to know, but.....

Some time back I was subscribed to a mailing list that was linked to a
web forum which attracted a lot of spam and whose owners were adverse to
any form of spam filtering apart from banning spammers after the fact.
Since I wanted to see the hammy list material I couldn't easily filter
on headers and so was forced to develop another way to separate spam
from ham. My prime weapon was a sales  blurb detector. This is basically
two rules that recognise product names and selling phrases combined by a
meta-rule that requires both rules to fire before it generates a hit.
The two base rules are both vast lists of alternate patterns. The
benefit of going this way is that now, quite a long time later, this
rule is still surprisingly effective in tagging UCE as spam. Better yet,
the lists now require quite infrequent updating, probably because there
is a limit to the types of product that get offered via UCE and an even
more limited set of phrases that are used to push it.

The drawbacks of this approach are twofold:
1) it takes a while for the rules to become big enough to be really
   effective
2) editing a large alternate list is really hard, largely because the
   regex is restricted to a single line. 

There's no easy fix for (1), but I did build a solution for (2) - an
awk-based script that converts a set of easily-edited rule definition
files into a .cf file containing a number of SA rules, each built from a
rule definition file. If this tool looks interesting to you, you can
find it here:

http://www.libelle-systems.com/free/portmanteau/portmanteau.tgz

This file is a compressed source archive that includes documentation for
the tool and the definition file format.


Martin




Reply via email to