At 06:33 PM 5/10/02 +0200, Klaus Heinz wrote:
>Craig R Hughes wrote:
>
>> purpose, here's a list of the current (CVS) top 20 most expensive rules
>> computationally:
>
>I was just looking at some of my rule changes and noticed I never
>thought about using the non-greedy quantifiers like {n,m}?.
>In the rule set for SA 2.20 I found only 3 rules (ASCII_FORM_ENTRY,
>VERY_SUSP_RECIPS, VERY_SUSP_CC_RECIPS) where the {n,m}? syntax is used.
>
>Could this be another area where performance might be improved?
>

In some cases it can make things worse.
It depends on the rule and the message.

But changing from greedy to non-greedy changes the behaviour of the rule, 
which might be a big problem.

The best improvements I've gotten so far have been by including
longer streches of plain text whenever possible.
I.e.
 /[^<][A-Za-z][A-Za-z]+.{1,15}?[\x09\x20]*___________{20,}/
instead of 
 /[^<][A-Za-z][A-Za-z]+.{1,15}?[\x09\x20]*_{30,}/


And by putting "anchor" text in front of '()' operators.
I.e. use 
 /\b(?:100%|completely|totally|absolutely) FREE/i
instead of 
 /(?:100%|completely|totally|absolutely) FREE/i

and use 
/do(?:n'?t delete this| not delete)/i
instead of 
/(?:don'?t delete this|do not delete)/i


Note: I did not test these in place, I used the following script,
so the results may be different.

--- begin test script ---
#
#!/usr/bin/perl -w

use Time::HiRes qw( gettimeofday );
@test_email = <>;
$test_line = join " ", @test_email;


@times = ();
for($i = 0; $i<11; $i++) {
    ($seconds1, $microseconds1) = gettimeofday;
    $test_line =~ /[^<][A-Za-z][A-Za-z]+.{1,15}?[\x09\x20]*_{30,}/ ;
    ($seconds2, $microseconds2) = gettimeofday;
    $et = ($seconds2-$seconds1)*1000000 + ($microseconds2-$microseconds1);
    push(@times, $et);
}
@t2 = sort(@times);
printf "%9d ASCII_FORM_ENTRY\n", $t2[5];

--- end test script ---

Those who are interested might want to look at
www.spamwolf.com/sa/test_maker.html

Scott Nelson <[EMAIL PROTECTED]>

_______________________________________________________________

Have big pipes? SourceForge.net is looking for download mirrors. We supply
the hardware. You get the recognition. Email Us: [EMAIL PROTECTED]
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to