On Thu, 2002-02-21 at 13:42, Arpi wrote: > when will it be implemented, or better: when will you accept such patch for > ruleset? (i cannot modify the perl code, as i don't know the perl languege > nor the spamassassin core enough, but i could help making this optimzation > to the ruleset)
You can attach rulefile patches to bug #47 in bugzilla. I'll stick them in CVS once the coding is done. It's not going to go into 2.1, but it'll be early on 2.2 http://bugzilla.spamassassin.org/show_bug.cgi?id=47 > anyway, i have a request: > could you add a new rule type, for plain text matches? > searching for a text string is always simpler and faster than for regexps, > and many of your regexps are such strings (/some words/i) and there will be > much more when start adding multiple-rule things. if you have a smart regexp library, then when it compiles a simple regex it should in fact just be doing whatever "simpler and faster" comparison to look for a text substring. In fact, because it's done some pre-compilation, it can do some fancy Boyer-Moore type searching, and run even faster than your typical strstr() implementation. > and i will implement spam phrase check such way: > go through the whole text, split it to words, calculate hash for each word > and lookup it in hash-table accelerated word table. > the word table contains a word->id mapping, each word has an uniqe serial > number id, word_id. > then: > ++word_match[word_id]; > ++phrase_match[previous_word_id][word_id]; > > so, when executing rule matching, and we have plain text string match > (instead of regexp) we could simply check the word_match array. > (at least in my C version, as my ruleset -> rules.c precompiler could > replace/extend these with word_id numbers) > > with the right balance, it could reduce rule matching to a single-pass > word/phrase counting and then matching only regexps having their requested > word counted. it could really speed up the whole process a lot. There are many rules though where they don't really have any "words" that you can pre-match against. So you're still going to have to do a medium-sized number of regex matches. You might not gain all that much over just doing the multi-match system, and the coding and complexity of the program will go up substantially. C _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk