Hi,

> > > rawbody ASCII_FORM_ENTRY        /[^<][A-Za-z][A-Za-z]+.{1,15}?\s+_{30,}/

> > [^<] means "any character except '<'".

> anyway, it explains why is this regexp so slow :(
> it partially matches at every character position of text, and only at the
> end (_{30,}) turns out that bad match...

ok. i've developed the solution, resulting 3 times faster (!!!) execution!
(and much more comes when i finnish it)

the trick is very simple: assign a single word for few regexps where it can
help. currently i'm using:

strstr ASCII_FORM_ENTRY         ____________________
strstr COMMUNIGATE              CommuniGate
strstr WANTS_CREDIT_CARD        credit
strstr ASKS_BILLING_ADDRESS     billing
strstr CYBER_FIRE_POWER         FirePower
strstr HR_3113                  3113
strstr WORK_AT_HOME             HOME
strstr MAILTO_LINK              mailto
strstr YOUR_INCOME              income
strstr BE_AMAZED                amazed
strstr ITS_EFFECTIVE            effective

my code executes the (slow) regexp matching ONLY if the input text
(header/body/rawbody...) contains the assigned word.
it's very usefull for regexps containing a fixed word, but doesn't begin
with something rare fixed char. for example, it reduced the execution time
of ASCII_FORM_ENTRY from >1ms (sometimes >3s) to <0.1ms!

I don't know how usable/possible this in perl version, but i want to use
such acceleration in the C version. So, would you accept a patch for ruleset
adding such (i called it 'strstr' anyway it's not a good name) fields and
commit to CVS?

anyway a different syntax should be introduced to make difference between
case sensitive and insensitive word matching.
(strstr and stristr or maybe strstr /word/i ?)


A'rpi / Astral & ESP-team

--
Developer of MPlayer, the Movie Player for Linux - http://www.MPlayerHQ.hu

_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to