At 06:33 PM 5/10/02 +0200, Klaus Heinz wrote: >Craig R Hughes wrote: > >> purpose, here's a list of the current (CVS) top 20 most expensive rules >> computationally: > >I was just looking at some of my rule changes and noticed I never >thought about using the non-greedy quantifiers like {n,m}?. >In the rule set for SA 2.20 I found only 3 rules (ASCII_FORM_ENTRY, >VERY_SUSP_RECIPS, VERY_SUSP_CC_RECIPS) where the {n,m}? syntax is used. > >Could this be another area where performance might be improved? >
In some cases it can make things worse. It depends on the rule and the message. But changing from greedy to non-greedy changes the behaviour of the rule, which might be a big problem. The best improvements I've gotten so far have been by including longer streches of plain text whenever possible. I.e. /[^<][A-Za-z][A-Za-z]+.{1,15}?[\x09\x20]*___________{20,}/ instead of /[^<][A-Za-z][A-Za-z]+.{1,15}?[\x09\x20]*_{30,}/ And by putting "anchor" text in front of '()' operators. I.e. use /\b(?:100%|completely|totally|absolutely) FREE/i instead of /(?:100%|completely|totally|absolutely) FREE/i and use /do(?:n'?t delete this| not delete)/i instead of /(?:don'?t delete this|do not delete)/i Note: I did not test these in place, I used the following script, so the results may be different. --- begin test script --- # #!/usr/bin/perl -w use Time::HiRes qw( gettimeofday ); @test_email = <>; $test_line = join " ", @test_email; @times = (); for($i = 0; $i<11; $i++) { ($seconds1, $microseconds1) = gettimeofday; $test_line =~ /[^<][A-Za-z][A-Za-z]+.{1,15}?[\x09\x20]*_{30,}/ ; ($seconds2, $microseconds2) = gettimeofday; $et = ($seconds2-$seconds1)*1000000 + ($microseconds2-$microseconds1); push(@times, $et); } @t2 = sort(@times); printf "%9d ASCII_FORM_ENTRY\n", $t2[5]; --- end test script --- Those who are interested might want to look at www.spamwolf.com/sa/test_maker.html Scott Nelson <[EMAIL PROTECTED]> _______________________________________________________________ Have big pipes? SourceForge.net is looking for download mirrors. We supply the hardware. You get the recognition. Email Us: [EMAIL PROTECTED] _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk