Hi,

> I had been thinking about creating a "multiple-rule" format for rules,
> where in order to match a rule, you would have to match a sequence of
> regexes, eg:
> 
> rawbody ASCII_FORM_ENTRY       /_{30,}/
> and rawbody ASCII_FORM_ENTRY  /[^<][A-Za-z][A-Za-z]+.{1,15}?\s+_{30,}/
> 
> the "and" prefix on a rule means to add that requirement, instead of
> overriding, and of course then have the thing do shor-circuit
> evaluation.  I'll add this in bugzilla too.

good idea. i had similar idea, but i imagined rule hierarchy, by assigning a
parent rule name to rules, and execute them only if the parent rule matched.
the problem wiht my idea was the need of flagging matched regexps for later
reuse...

when will it be implemented, or better: when will you accept such patch for
ruleset? (i cannot modify the perl code, as i don't know the perl languege
nor the spamassassin core enough, but i could help making this optimzation
to the ruleset)

anyway, i have a request:
could you add a new rule type, for plain text matches?
searching for a text string is always simpler and faster than for regexps,
and many of your regexps are such strings (/some words/i) and there will be
much more when start adding multiple-rule things.

and i will implement spam phrase check such way:
go through the whole text, split it to words, calculate hash for each word
and lookup it in hash-table accelerated word table.
the word table contains a word->id mapping, each word has an uniqe serial
number id, word_id.
then:
   ++word_match[word_id];
   ++phrase_match[previous_word_id][word_id];

so, when executing rule matching, and we have plain text string match
(instead of regexp) we could simply check the word_match array.
(at least in my C version, as my ruleset -> rules.c precompiler could
replace/extend these with word_id numbers)

with the right balance, it could reduce rule matching to a single-pass
word/phrase counting and then matching only regexps having their requested
word counted. it could really speed up the whole process a lot.


A'rpi / Astral & ESP-team

--
Developer of MPlayer, the Movie Player for Linux - http://www.MPlayerHQ.hu

_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to