On Thu, 2013-04-25 at 18:45 -0400, Andrew Talbot wrote:

> I like your point about the portmanteau rules (and I award you two
> Points for using one of my favorite words in a new - yet appropriate -
> manner!). 
> 
:-)


> I never thought about scoring each rule as a 0.001 or something really
> low then tying them all together with meta-rules. It's been a while
> since I separated everything out but I believe I have around 1000
> different checks (most of them portmanteau'd) so it seems like those
> meta rules would just get ... Messy. But it's a good idea, and I think
> I can especially make use of it in my Specific Word list. 
> 
The metas aren't too bad, though I must admit to building some of them
as metas of metas to keep all lines down to 72 chars or so. Most of
these submetas are simply lists of other rules that have been ANDed or
ORed together.

You may find that the Portmanteau Generator reduces your rule count
because it too can generate metas, which I use to deal with situations
where a term can appear in more than one case, e.g. a generated rule can
have this form:

describe GENRULE Example rule  
header   __GR1   Reply-to =~ /(\@spam1\.com|\@spammer\.co\.uk|....)
header   __GR2   From     =~ /(\@spam1\.com|\@spammer\.co\.uk|....)
uri      __GR3   From     =~ /(\@spam1\.com|\@spammer\.co\.uk|....)
meta     GENRULE (__P1 || __P2 || __P3)
score    GENRULE 1.5

which has two advantages. First, that GENRULE is a single name that
covers the same spammy term regardless of where it was used and
secondly, since each generated rule has its own source file, this makes
the three related lists easier to edit, since there's a good chance that
a spammy term might be used in more than one of the related lists.
  
> Keeping the rules under 1-2mb is a good rule of thumb to follow.
> Luckily we're nowhere near that point yet. 
> 
Nor am I. As I said, my biggest generated rule is a bit over 9 KB.

> Can I ask how many rules you have, and how many of those are meta 
> rules? 
>
I have 31 portmanteau rules, of which 9 contain metas. Only 12 of these
have a score exceeding 1.0 and these are not usually used as part of
higher level metarules

My local.cf is where any very specific rules live, along with the higher
level metarules that use the low scoring portmanteau rules. This
contains 129 rules which between them contain 96 'meta' statements. 36
of these have scores of under 1.0, so are probably used as components of
metarules.  The total number of rules was obtained by using grep+wc to
count lines containing '^score'.

my local.cf and portmanteau.cf files are both 29 KB in size.


Martin




Reply via email to