Re: [HACKERS] Our regex vs. POSIX on "longest match"

2012-03-05 Thread Tom Lane
Martijn van Oosterhout writes: > On the otherhand, I think requiring an "overall longest match" makes > your implementation non-polynomial complexity. Only if you don't know how to implement it -- a DFA-based implementation doesn't have much trouble with this. > [ equivalence of knapsack problem

Re: [HACKERS] Our regex vs. POSIX on "longest match"

2012-03-05 Thread Martijn van Oosterhout
On Mon, Mar 05, 2012 at 11:28:24AM -0500, Tom Lane wrote: > Robert Haas writes: > > I think the right way to imagine this is as though the regular > > expression were being matched to the source text in left-to-right > > fashion. > > No, it isn't. You are headed down the garden path that leads t

Re: [HACKERS] Our regex vs. POSIX on "longest match"

2012-03-05 Thread Robert Haas
On Mon, Mar 5, 2012 at 11:28 AM, Tom Lane wrote: > Robert Haas writes: >> I think the right way to imagine this is as though the regular >> expression were being matched to the source text in left-to-right >> fashion. > > No, it isn't.  You are headed down the garden path that leads to a > Perl-s

Re: [HACKERS] Our regex vs. POSIX on "longest match"

2012-03-05 Thread Tom Lane
Robert Haas writes: > I think the right way to imagine this is as though the regular > expression were being matched to the source text in left-to-right > fashion. No, it isn't. You are headed down the garden path that leads to a Perl-style definition-by-implementation, and in particular you are

Re: [HACKERS] Our regex vs. POSIX on "longest match"

2012-03-05 Thread Brendan Jurd
On 5 March 2012 17:23, Robert Haas wrote: > This is different from what Perl does, but I think Perl's behavior > here is batty: given a+|a+b+ and the string aaabbb, it picks the first > branch and matches only aaa. Yeah, this is sometimes referred to as "ordered alternation", basically that the b

Re: [HACKERS] Our regex vs. POSIX on "longest match"

2012-03-04 Thread Robert Haas
On Sun, Mar 4, 2012 at 3:20 PM, Tom Lane wrote: > hubert depesz lubaczewski writes: >> I stand on position that mixing greedy and non-greedy operators should >> be possible, and that it should work according to POLA - i.e. greedines >> of given atom shouldn't be influenced by other atoms. > > [ s

Re: [HACKERS] Our regex vs. POSIX on "longest match"

2012-03-04 Thread Brendan Jurd
On 5 March 2012 04:34, Tom Lane wrote: > Brendan Jurd writes: >> On 4 March 2012 17:53, Tom Lane wrote: >>> Brendan Jurd writes: >> While it's true that POSIX doesn't contemplate non-greed, after >> reading the spec I would have expected an expression *as a whole* to >> still prefer the longest

Re: [HACKERS] Our regex vs. POSIX on "longest match"

2012-03-04 Thread Tom Lane
hubert depesz lubaczewski writes: > I stand on position that mixing greedy and non-greedy operators should > be possible, and that it should work according to POLA - i.e. greedines > of given atom shouldn't be influenced by other atoms. [ shrug... ] That sounds good, but it's pretty much vacuous

Re: [HACKERS] Our regex vs. POSIX on "longest match"

2012-03-04 Thread hubert depesz lubaczewski
On Sun, Mar 04, 2012 at 12:34:22PM -0500, Tom Lane wrote: > Well, that's just an arbitrary example. The cases I remember people > complaining about in practice were the other way round: greedy > quantifier followed by non-greedy, and they were unhappy that the > non-greediness was effectively not

Re: [HACKERS] Our regex vs. POSIX on "longest match"

2012-03-04 Thread Tom Lane
Brendan Jurd writes: > On 4 March 2012 17:53, Tom Lane wrote: >> Brendan Jurd writes: >>> I'll admit that this is a pretty obscure point, but we do appear to be >>> in direct violation of POSIX here. >> How so? POSIX doesn't contain any non-greedy constructs. If you use >> only the POSIX-comp

Re: [HACKERS] Our regex vs. POSIX on "longest match"

2012-03-04 Thread Brendan Jurd
On 4 March 2012 17:53, Tom Lane wrote: > Brendan Jurd writes: >> I'll admit that this is a pretty obscure point, but we do appear to be >> in direct violation of POSIX here. > > How so?  POSIX doesn't contain any non-greedy constructs.  If you use > only the POSIX-compatible greedy constructs, th

Re: [HACKERS] Our regex vs. POSIX on "longest match"

2012-03-03 Thread Tom Lane
Brendan Jurd writes: > I am in the process of accelerating down the rabbit hole of regex > internals. Something that came up during my reading, is that a POSIX > compliant regex engine ought to always prefer the longest possible > match, when multiple matches are possible beginning from the same

[HACKERS] Our regex vs. POSIX on "longest match"

2012-03-03 Thread Brendan Jurd
Hello folks, I am in the process of accelerating down the rabbit hole of regex internals. Something that came up during my reading, is that a POSIX compliant regex engine ought to always prefer the longest possible match, when multiple matches are possible beginning from the same location in the