On Fri, 2011-05-27 at 10:38 -0400, Kris Deugau wrote:
> Karsten Bräckelmann wrote:
> > > However, we've just had a couple of *legitimate* messages get stuck for
> > > essentially the same reason - a whole lot of pathologically bad HTML.
> >
> > Rings a bell. Such reports usually turned out to be caused by custom
> > rules. Any custom rawbody rules, in particular ones matching HTML tags,
> 
> Yes, a few.
> 
> > or otherwise prone to trigger RE backtracking? (That is, may consume
> > large sub-strings, before a following sub-pattern.)
> 
> Mmmm.  I don't *think* so, but testing the message on a stock SA 3.3.1 
> took "only" a minute (on slow hardware) vs 13 (on my much faster desktop).

The latter being the production system with the custom rules, or at
least having an identical set of custom rules?

Yes, that sounds like the culprit indeed is one or more custom rule. If
that "much faster" equals twice as fast, your custom rules are taking
25(!) times as long as the complete stock rule-set, including all the
parsing and stuff.

Bisection is your friend.

Go hunt down that bugger, that in conjunction with the specific sample
kills your performance. Once you found it, maybe you can post it?


> I have a couple of instances of [a-z]+ and similar;  is that effectively 
> as troublesome as .+ or .*?

That on its own (i.e. not nested inside an alternation, etc) is very
unlikely to be the issue, since it appears to be triggered by the HTML
in the message.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Reply via email to