Karsten Bräckelmann wrote:
On Fri, 2011-05-27 at 10:38 -0400, Kris Deugau wrote:
Mmmm. I don't *think* so, but testing the message on a stock SA 3.3.1
took "only" a minute (on slow hardware) vs 13 (on my much faster desktop).
The latter being the production system with the custom rules, or at
least having an identical set of custom rules?
Yeah; I create the rules on my desktop (usually with an example spam on
hand to make sure the rule hits what I intended it to hit), commit to
svn, and periodically merge changes to a branch that's autopublished in
something resembling the same way as the official stock rules and JM's
SOUGHT rules.
Yes, that sounds like the culprit indeed is one or more custom rule. If
that "much faster" equals twice as fast,
Probably closer to 4-6x; dual PIII/866 -> Core i3 3GHz.
Bisection is your friend.
Go hunt down that bugger, that in conjunction with the specific sample
kills your performance. Once you found it, maybe you can post it?
Seems to have been this:
rawbody TOO_MANY_DIVS /(?:<[Dd][Ii][Vv]>(?:\s|\n|\ \;)*){6}/
describe TOO_MANY_DIVS 6 or move <div> tags in a row
score TOO_MANY_DIVS 0.75
Changing the * to {,100} drops the processing time down to ~8s.
I've got a number of similar rules for other "many logical/physical
linebreaks with no content". I don't have a specific spample to point
to just now, but from memory the original targets really did have a
widely varying number of linebreaks or whitespace (logical or otherwise)
in between the HTML tags, and I've been bitten before with applying
bounds to matches (related rules for garbage HTML comments) not being
*large* enough. O_o
This particular message has page after page of:
=09=09=09
=09=09=09
=09=09=09
=09
=09
=09
etc, with a few <div> or <font> tags for excitement.
-kgd