Right. Considering a longer sequence of a's, "(a*)\1" allows a wide variety of matches. But in fact, this is not what I was trying to use. I was more looking at "(a)\1*" which shall match exactly what "a+" matches. As matching is greedy, "(a)\1*" shall consume all a's in a sequence in one go, just like "a+" does...?!
Regards, Ingolf On Fri, Aug 20, 2021 at 6:52 PM Tom Lane <t...@sss.pgh.pa.us> wrote: > "Markhof, Ingolf" <ingolf.mark...@de.verizon.com> writes: > > thank you very much for your reply. Actually, I was assuming all these > > regular expressions are based on the same core implementation. > > They are not. There are at least three fundamentally different > implementation technologies (DFA, NFA, hybrid). Friedl's "Mastering > Regular Expressions" cites multiple different programs using each > of those, every one of which behaves a bit differently when you start > poking at corner cases. And that's just in the open-source world; > I don't know what Oracle is using, but I bet it ain't open source. > > > I am also surprised that you say the (\1)+ subpattern is computationally > > expensive. Regular expressions are greedy by default. I.e. in case of a* > > matching against a string of 1000 a's, the system will not try a, aa, > aaa, > > ... and so on, right? Instead, it will consume all the a's in one go. > > "a*" is easy. "(a*)\1" is less easy --- if you let the a* consume the > whole string, you will not get a match, even though one is possible. > In general, backrefs create a mess in what would otherwise be a pretty > straightforward concept :-(. > > regards, tom lane > ====================================================================== Verizon Deutschland GmbH - Sebrathweg 20, 44149 Dortmund, Germany - Amtsgericht Dortmund, HRB 14952 - Geschäftsführer: Detlef Eppig - Vorsitzender des Aufsichtsrats: Francesco de Maio