On Fri, 2011-05-27 at 13:14 -0400, Kris Deugau wrote:
> Karsten Bräckelmann wrote:

> > Yes, that sounds like the culprit indeed is one or more custom rule. If
> > that "much faster" equals twice as fast,
> 
> Probably closer to 4-6x;  dual PIII/866 -> Core i3 3GHz.

Sure -- that "twice" assumption was just a quickly assumed lower bound,
that still shows the dramatic difference of the custom rule burning a
whopping 25 times the CPU.

> > Bisection is your friend.
> >
> > Go hunt down that bugger, that in conjunction with the specific sample
> > kills your performance. Once you found it, maybe you can post it?
> 
> Seems to have been this:
> 
> rawbody TOO_MANY_DIVS /(?:<[Dd][Ii][Vv]>(?:\s|\n|\&nbsp\;)*){6}/

Aha! Yes, that nesting of quantifiers sure looks like a prime candidate.
Even though this isn't the pure evil form -- which would be to have two
alternatives with overlap in sub-patterns.

Or maybe it is. Frankly, not sure what exactly causes the RE to go
berserk.

> Changing the * to {,100} drops the processing time down to ~8s.

Confirmed, grabbed your sample and this eliminates the issue.

However, using (?:\s|\&nbsp;)* also does the trick. Yes, keeping the
nasty asterisk quantifier. The difference is merely dropping the \n from
the alternation, which is part of \s whitespace anyway.

Wondering if this is a case where Perl fails to optimize out the \n.
Which would result in an alternation with overlap...


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Reply via email to