Re: [9fans] non greedy regular expressions

Rudolf Sykora Fri, 24 Oct 2008 13:02:09 -0700

> In that model, it is not accurate to describe the * + ?
> operators as greedy or not.  None of them is working
> toward any goal other than the overall longest match
> at the leftmost position.


So then I must be mistaken about the terminology.
I thought greedy=leftmost-longest, while non-greedy=leftmost-first:
Having
bla bla (AB)(CDEF)(GH)
/\(.*\)/
matches the whole (AB)(CDEF)(GH), 'greedy', while if '*' were
'non-greedy', I'd expect a match with (AB). This is now not in Plan9.
This mentioned example is easy to solve, if I want to go in 'bracket'
steps with
/\([^)]*\)/
but there are harder examples, eg. where the_interesting_part is
delimited with a more complicated structure ('ABC' here):
ABCthe_interesting_partABC blabla bla ABCthe_interesting_partABC etc.
Now, how to parse it? With non-greedy '*', no problem. With 'greedy'
and no negative lookahead assertion? Ok, maybe 'y' in sam could help
(don't know). What about
ABCthe_interesting_partCBA blabla bla ABCthe_interesting_partCBA etc.?
All the thinking about this is simply removed with 'non-greedy' ops.

Ruda



> In Perl and its imitators, the match starts at the leftmost
> position but is otherwise the first one that is found,
> not necessarily the longest.  In that context, words like
> "greedy" and "non-greedy" start to make sense,
> because the behavior of any one operator influences which
> match is encountered first.
>
> Either approach -- leftmost-longest or leftmost-first --
> can be implemented using finite automata.
>
> Russ
>
>

Re: [9fans] non greedy regular expressions

Reply via email to