Re: [E] Re: Regexp_replace bug / does not terminate on long strings

2021-08-23 Thread Markhof, Ingolf
You are right, I also found the same behaviour when using e.g the UNIX sed command. Ingolf On Mon, Aug 23, 2021 at 4:24 PM Francisco Olarte wrote: > Ingolf: > > On Mon, Aug 23, 2021 at 2:39 PM Markhof, Ingolf > wrote: > > Yes, When I use (\1)? instead of (\1)+, the expression is evaluated > q

Re: [E] Re: Regexp_replace bug / does not terminate on long strings

2021-08-23 Thread Francisco Olarte
Ingolf: On Mon, Aug 23, 2021 at 2:39 PM Markhof, Ingolf wrote: > Yes, When I use (\1)? instead of (\1)+, the expression is evaluated quickly, > but it doesn't return what I want. Once a word is written, it is not subject > to matching again. i.e. > select regexp_replace( --> remove double entri

Re: [E] Re: Regexp_replace bug / does not terminate on long strings

2021-08-23 Thread Markhof, Ingolf
Argh... Yes, When I use (\1)? instead of (\1)+, the expression is evaluated quickly, but it doesn't return what I want. Once a word is written, it is not subject to matching again. i.e. select regexp_replace( --> remove double entries 'one,one,one,two,two,three,three', '([^,]+)(,\1)?($|,)

Re: [E] Re: Regexp_replace bug / does not terminate on long strings

2021-08-23 Thread Markhof, Ingolf
Right. Considering a longer sequence of a's, "(a*)\1" allows a wide variety of matches. But in fact, this is not what I was trying to use. I was more looking at "(a)\1*" which shall match exactly what "a+" matches. As matching is greedy, "(a)\1*" shall consume all a's in a sequence in one go, just

Re: [E] Regexp_replace bug / does not terminate on long strings

2021-08-20 Thread Mark Dilger
> On Aug 20, 2021, at 12:51 PM, Miles Elam wrote: > > Unbounded ranges seem like a problem. Seems so. The problem appears to be in regcomp.c's repeat() function which handles {1,SOME} differently than {1,INF} > Seems worth trying a range from 1 to N where you play around with N to find >

Re: [E] Regexp_replace bug / does not terminate on long strings

2021-08-20 Thread Miles Elam
On Fri, Aug 20, 2021 at 12:32 PM Mark Dilger wrote: > > The following queries take radically different time to run: > Unbounded ranges seem like a problem. Seems worth trying a range from 1 to N where you play around with N to find your optimum performance/functionality tradeoff. {1,20} is like

Re: [E] Regexp_replace bug / does not terminate on long strings

2021-08-20 Thread Mark Dilger
> On Aug 20, 2021, at 9:52 AM, Tom Lane wrote: > > "a*" is easy. "(a*)\1" is less easy --- if you let the a* consume the > whole string, you will not get a match, even though one is possible. > In general, backrefs create a mess in what would otherwise be a pretty > straightforward concept :-

Re: [E] Re: Regexp_replace bug / does not terminate on long strings

2021-08-20 Thread Tom Lane
"Markhof, Ingolf" writes: > thank you very much for your reply. Actually, I was assuming all these > regular expressions are based on the same core implementation. They are not. There are at least three fundamentally different implementation technologies (DFA, NFA, hybrid). Friedl's "Mastering

Re: [E] Re: Regexp_replace bug / does not terminate on long strings

2021-08-20 Thread Markhof, Ingolf
Thank you very much for all your proposals! Ingolf == Verizon Deutschland GmbH - Sebrathweg 20, 44149 Dortmund, Germany - Amtsgericht Dortmund, HRB 14952 - Geschäftsführer: Detlef Eppig - Vorsitzender des Aufsichtsrats: France

Re: [E] Re: Regexp_replace bug / does not terminate on long strings

2021-08-20 Thread Markhof, Ingolf
Hi Tom, thank you very much for your reply. Actually, I was assuming all these regular expressions are based on the same core implementation. Interestingly, this doesn't seem to be true... I am also surprised that you say the (\1)+ subpattern is computationally expensive. Regular expressions are

Re: Regexp_replace bug / does not terminate on long strings

2021-08-19 Thread Michael Lewis
Btw- My apologies for top posting. I think my caffeine wore off.

Re: Regexp_replace bug / does not terminate on long strings

2021-08-19 Thread Michael Lewis
If you need it ordered, this is a bit awkward but works and returns for me in about 5ms on my dev machine. select string_agg( value, ',' ) As final_result from( select value, min( row_num ) as min_row_num from( select sub.value, row_number() over () as row_num from ( select unnest( strin

Re: Regexp_replace bug / does not terminate on long strings

2021-08-19 Thread Tom Lane
"Markhof, Ingolf" writes: > BRIEF: > regexp_replace(source,pattern,replacement,flags) needs very (!) long to > complete or does not complete at all (?!) for big input strings (a few k > characters). (Oracle SQL completes the same in a few ms) Regexps containing backrefs are inherently hard --- ev

Regexp_replace bug / does not terminate on long strings

2021-08-19 Thread Markhof, Ingolf
BRIEF: regexp_replace(source,pattern,replacement,flags) needs very (!) long to complete or does not complete at all (?!) for big input strings (a few k characters). (Oracle SQL completes the same in a few ms) VERBOSE Given a comma-separated list of "words" (whereas a word is any sequence of char