On Thu, 2013-04-25 at 18:45 -0400, Andrew Talbot wrote: > I like your point about the portmanteau rules (and I award you two > Points for using one of my favorite words in a new - yet appropriate - > manner!). > :-)
> I never thought about scoring each rule as a 0.001 or something really > low then tying them all together with meta-rules. It's been a while > since I separated everything out but I believe I have around 1000 > different checks (most of them portmanteau'd) so it seems like those > meta rules would just get ... Messy. But it's a good idea, and I think > I can especially make use of it in my Specific Word list. > The metas aren't too bad, though I must admit to building some of them as metas of metas to keep all lines down to 72 chars or so. Most of these submetas are simply lists of other rules that have been ANDed or ORed together. You may find that the Portmanteau Generator reduces your rule count because it too can generate metas, which I use to deal with situations where a term can appear in more than one case, e.g. a generated rule can have this form: describe GENRULE Example rule header __GR1 Reply-to =~ /(\@spam1\.com|\@spammer\.co\.uk|....) header __GR2 From =~ /(\@spam1\.com|\@spammer\.co\.uk|....) uri __GR3 From =~ /(\@spam1\.com|\@spammer\.co\.uk|....) meta GENRULE (__P1 || __P2 || __P3) score GENRULE 1.5 which has two advantages. First, that GENRULE is a single name that covers the same spammy term regardless of where it was used and secondly, since each generated rule has its own source file, this makes the three related lists easier to edit, since there's a good chance that a spammy term might be used in more than one of the related lists. > Keeping the rules under 1-2mb is a good rule of thumb to follow. > Luckily we're nowhere near that point yet. > Nor am I. As I said, my biggest generated rule is a bit over 9 KB. > Can I ask how many rules you have, and how many of those are meta > rules? > I have 31 portmanteau rules, of which 9 contain metas. Only 12 of these have a score exceeding 1.0 and these are not usually used as part of higher level metarules My local.cf is where any very specific rules live, along with the higher level metarules that use the low scoring portmanteau rules. This contains 129 rules which between them contain 96 'meta' statements. 36 of these have scores of under 1.0, so are probably used as components of metarules. The total number of rules was obtained by using grep+wc to count lines containing '^score'. my local.cf and portmanteau.cf files are both 29 KB in size. Martin