Re: Another regexp performance improvement: skip useless paren-captures

Tom Lane Mon, 09 Aug 2021 18:11:40 -0700

Mark Dilger <mark.dil...@enterprisedb.com> writes:
>> On Aug 9, 2021, at 4:31 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>> There is a potentially interesting definitional question:
>> what exactly ought this regexp do?
>>      ((.)){0}\2
>> Because the capturing paren sets are zero-quantified, they will
>> never be matched to any characters, so the backref can never
>> have any defined referent.


> Perl regular expressions are not POSIX, but if there is a principled reason 
> POSIX should differ from perl on this, we should be clear what that is:

>     if ('foo' =~ m/((.)(??{ die; })){0}(..)/)
>     {
>         print "captured 1 $1\n" if defined $1;
>         print "captured 2 $2\n" if defined $2;
>         print "captured 3 $3\n" if defined $3;
>         print "captured 4 $4\n" if defined $4;
>         print "match = $match\n" if defined $match;
>     }

Hm.  I'm not sure that this example proves anything about Perl's handling
of the situation, since you didn't use a backref.  I tried both

        if ('foo' =~ m/((.)){0}\1/)

        if ('foo' =~ m/((.)){0}\2/)

and while neither throws an error, they don't succeed either.
So AFAICS Perl is acting in the way I'm attributing to POSIX.
But maybe we should actually read POSIX ...

>> ... I guess Spencer did think about this to some extent -- he
>> just forgot about the possibility of nested parens.

> Ugg.  That means our code throws an error where perl does not, pretty
> well negating my point above.  If we're already throwing an error for
> this type of thing, I agree we should be consistent about it.  My
> personal preference would have been to do the same thing as perl, but it
> seems that ship has already sailed.

Removing an error case is usually an easier sell than adding one.
However, the fact that the simplest case (viz, '(.){0}\1') has always
thrown an error and nobody has complained in twenty-ish years suggests
that nobody much cares.

                        regards, tom lane

Re: Another regexp performance improvement: skip useless paren-captures

Reply via email to