On Thu, 2014-08-28 at 17:42 -0500, Robert A. Ober wrote: > If you find a solution without my having to write more rules I would > appreciate you letting the list know. > This may not be what you want to know, but.....
Some time back I was subscribed to a mailing list that was linked to a web forum which attracted a lot of spam and whose owners were adverse to any form of spam filtering apart from banning spammers after the fact. Since I wanted to see the hammy list material I couldn't easily filter on headers and so was forced to develop another way to separate spam from ham. My prime weapon was a sales blurb detector. This is basically two rules that recognise product names and selling phrases combined by a meta-rule that requires both rules to fire before it generates a hit. The two base rules are both vast lists of alternate patterns. The benefit of going this way is that now, quite a long time later, this rule is still surprisingly effective in tagging UCE as spam. Better yet, the lists now require quite infrequent updating, probably because there is a limit to the types of product that get offered via UCE and an even more limited set of phrases that are used to push it. The drawbacks of this approach are twofold: 1) it takes a while for the rules to become big enough to be really effective 2) editing a large alternate list is really hard, largely because the regex is restricted to a single line. There's no easy fix for (1), but I did build a solution for (2) - an awk-based script that converts a set of easily-edited rule definition files into a .cf file containing a number of SA rules, each built from a rule definition file. If this tool looks interesting to you, you can find it here: http://www.libelle-systems.com/free/portmanteau/portmanteau.tgz This file is a compressed source archive that includes documentation for the tool and the definition file format. Martin