On Fri, 2010-08-06 at 13:51 -0400, Randy Ramsdell wrote: > > > body __RCR_MEGADK /.*(M.*E.*G.*A.*D.*K).*/ > > > > There are a few things that strike me as peculiar about that rule. Not > > least of which is that it would appear to match the following - > > hypothetical, but plausible - message. The presence of seven > > unrestricted greedy specifiers makes it perfectly plausible to me that > > it would take quite a long time to process any moderately long message.
> It does take a long time to process a message and a very short message > to boot. In fact, it never finishes and runs the cpus to 100% so the > rule has been removed. I still wonder if this is a bug. It is a bug with the RE. It leads straight to backtracking hell. OK, what happened there? First, the greedy /.*/ at the beginning slurps in the entire string. Then the Perl RE engine removes one char at a time, in an effort to match an M -- the last occurrence. If it found one, the second greedy /.*/ slurps in the entire remainder of the string, and the backtracking game begins again, this time matching an E. And so on. However, each time this process does not result in a match for the entire RE, backtracking will cause the above process to go back a step (a char in that RE), and start from there. It is not a bug with SA, neither with Perl. It is the pathological RE that causes an insane amount of backtracking with long strings, in particular if there are many candidates -- that means upper-case chars. The worst problem is the use of multiple, unbound, greedy /.*/ patterns. As has been mentioned before, it would be much better to use a bound variant like /.{0,10}/ instead. Also, using non-greedy variants likely would help, too. As would using a char-class, rather than accepting any char -- at the very least, that will help accuracy and prevent matches on innocent text. guenther -- char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1: (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}