On Mon, 2011-10-24 at 13:58 -0700, Adam Katz wrote: > On 10/23/2011 06:44 PM, Karsten Bräckelmann wrote: > > [...] as I read it, the warning is referring to the usage of the > > special $&, $` and $' match capturing variables, resulting in a > > substantial performance penalty -- and mentions the non-capturing > > extended regex in this *context*, since it uses the same mechanism > > for the $n matches. If these special vars are used. > > Using special variables like those you mentioned are particularly bad, > especially with some of the older versions of perl (I seem to recall > some of them getting big performance boosts in more recent perl > revisions). That's not to say that the extra memory consumption from an > unnecessary grouping doesn't impact performance.
Well, does it? Measurably? Even if the RE does *not* match? If so, does it still have any measurable effect, if we're talking a handful custom rules, with stock rules using non-capturing grouping? (The objective here is a trade-off between optimized REs and not confusing users who aren't intimately familiar with REs. They tend to get heavy to grasp rather quickly, and the extra ?: weird chars don't help that.) > > Now, I just grepped the entire SA source code, and NONE of these > > spacial vars are used. Yay! (I did not grep all external SA > > dependencies, mind you.) > > I'm guessing I'm not the only person that looks through the rules > periodically for such things, including frivolous portions like the glob > in /foo.*/ or the range in /bar\W{2,30}/ and wipe them out to become > e.g. /foo/ and /bar\W{2}/ Sure, those are bad, but an entirely different beast. > > So, does this "substantial performance penalty" using capturing > > groups even apply to SA? > > > > Is it really worth it, religiously using non-capturing grouping? > > From the profiling I've seen, yes it is. (I don't have data to share > though, sorry). The profiled code, does it use the special match capturing variables *anywhere* in the entire program? The profiled and compared versions, would that be like the equivalent of using capturing vs non-capturing in all SA stock rules? Not trying to be confrontational, just honestly asking and wondering about the real impact. After all, the perlre docs specifically mention to strongly prefer non-capturing grouping basically once only -- in the warning paragraph about the special vars. -- char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1: (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}