Right. Considering a longer sequence of a's, "(a*)\1" allows a wide variety
of matches. But in fact, this is not what I was trying to use. I was more
looking at "(a)\1*" which shall match exactly what "a+" matches. As
matching is greedy, "(a)\1*" shall consume all a's in a sequence in one go,
just like "a+" does...?!

Regards,
Ingolf


On Fri, Aug 20, 2021 at 6:52 PM Tom Lane <t...@sss.pgh.pa.us> wrote:

> "Markhof, Ingolf" <ingolf.mark...@de.verizon.com> writes:
> > thank you very much for your reply. Actually, I was assuming all these
> > regular expressions are based on the same core implementation.
>
> They are not.  There are at least three fundamentally different
> implementation technologies (DFA, NFA, hybrid).  Friedl's "Mastering
> Regular Expressions" cites multiple different programs using each
> of those, every one of which behaves a bit differently when you start
> poking at corner cases.  And that's just in the open-source world;
> I don't know what Oracle is using, but I bet it ain't open source.
>
> > I am also surprised that you say the (\1)+ subpattern is computationally
> > expensive. Regular expressions are greedy by default. I.e. in case of a*
> > matching against a string of 1000 a's, the system will not try a, aa,
> aaa,
> > ... and so on, right? Instead, it will consume all the a's in one go.
>
> "a*" is easy.  "(a*)\1" is less easy --- if you let the a* consume the
> whole string, you will not get a match, even though one is possible.
> In general, backrefs create a mess in what would otherwise be a pretty
> straightforward concept :-(.
>
>                         regards, tom lane
>

======================================================================

Verizon Deutschland GmbH - Sebrathweg 20, 44149 Dortmund, Germany - Amtsgericht 
Dortmund, HRB 14952 - Geschäftsführer: Detlef Eppig - Vorsitzender des 
Aufsichtsrats: Francesco de Maio

Reply via email to