On 8/4/21 6:15 PM, Tom Lane wrote: > Here's a little finger exercise that improves a case that's bothered me > for awhile. In a POSIX regexp, parentheses cause capturing by default; > you have to write the very non-obvious "(?:...)" if you don't want the > matching substring to be reported by the regexp engine.
It's not obscure to perl programmers :-) > That'd be fine > if capturing were cheap, but with our engine it is not particularly > cheap. In many situations, the initial DFA check is sufficient to > tell whether there is an overall match, but it does not tell where any > subexpression match boundaries are. To identify exactly which substring > is deemed to match a parenthesized subexpression, we have to recursively > break down the match, which takes at the very least a few more DFA > invocations; and with an uncooperative regex, it can easily result in > O(N^2) behavior where there was none at the DFA stage. > > Therefore, we really ought to expend some effort to not capture > subexpressions if the sub-match data is not actually needed, which in > many invocations we know that it isn't. Spencer's original code has > a REG_NOSUB option that looks like it ought to be good for this ... but > on closer inspection it's basically useless, because it turns *all* > parens into non-capturing ones. That breaks back-references, so unless > you know that the regexp contains no back-refs, you can't use it. In perl you can use the 'n' modifier for this effect (since 5.22) I would expect to know if a back-ref were present. I'm a bit worried about how you'll keep track of back-ref numbering since back-refs only count capturing groups, and you're silently turning a capturing group into a non-capturing group. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com