Re: Large (usually legitimate) HTML mails choking SA

Kris Deugau Fri, 27 May 2011 10:14:55 -0700

Karsten Bräckelmann wrote:

On Fri, 2011-05-27 at 10:38 -0400, Kris Deugau wrote:

Mmmm.  I don't *think* so, but testing the message on a stock SA 3.3.1
took "only" a minute (on slow hardware) vs 13 (on my much faster desktop).


The latter being the production system with the custom rules, or at
least having an identical set of custom rules?

Yeah; I create the rules on my desktop (usually with an example spam onhand to make sure the rule hits what I intended it to hit), commit tosvn, and periodically merge changes to a branch that's autopublished insomething resembling the same way as the official stock rules and JM'sSOUGHT rules.

Yes, that sounds like the culprit indeed is one or more custom rule. If
that "much faster" equals twice as fast,


Probably closer to 4-6x;  dual PIII/866 -> Core i3 3GHz.

Bisection is your friend.

Go hunt down that bugger, that in conjunction with the specific sample
kills your performance. Once you found it, maybe you can post it?


Seems to have been this:

rawbody TOO_MANY_DIVS   /(?:<[Dd][Ii][Vv]>(?:\s|\n|\&nbsp\;)*){6}/
describe TOO_MANY_DIVS  6 or move <div> tags in a row
score TOO_MANY_DIVS     0.75

Changing the * to {,100} drops the processing time down to ~8s.

I've got a number of similar rules for other "many logical/physicallinebreaks with no content". I don't have a specific spample to pointto just now, but from memory the original targets really did have awidely varying number of linebreaks or whitespace (logical or otherwise)in between the HTML tags, and I've been bitten before with applyingbounds to matches (related rules for garbage HTML comments) not being*large* enough. O_o


This particular message has page after page of:

=09=09=09
=09=09=09
=09=09=09
=09
=09
=09

etc, with a few <div> or <font> tags for excitement.

-kgd

Re: Large (usually legitimate) HTML mails choking SA

Reply via email to