> In that model, it is not accurate to describe the * + ? > operators as greedy or not. None of them is working > toward any goal other than the overall longest match > at the leftmost position.
So then I must be mistaken about the terminology. I thought greedy=leftmost-longest, while non-greedy=leftmost-first: Having bla bla (AB)(CDEF)(GH) /\(.*\)/ matches the whole (AB)(CDEF)(GH), 'greedy', while if '*' were 'non-greedy', I'd expect a match with (AB). This is now not in Plan9. This mentioned example is easy to solve, if I want to go in 'bracket' steps with /\([^)]*\)/ but there are harder examples, eg. where the_interesting_part is delimited with a more complicated structure ('ABC' here): ABCthe_interesting_partABC blabla bla ABCthe_interesting_partABC etc. Now, how to parse it? With non-greedy '*', no problem. With 'greedy' and no negative lookahead assertion? Ok, maybe 'y' in sam could help (don't know). What about ABCthe_interesting_partCBA blabla bla ABCthe_interesting_partCBA etc.? All the thinking about this is simply removed with 'non-greedy' ops. Ruda > In Perl and its imitators, the match starts at the leftmost > position but is otherwise the first one that is found, > not necessarily the longest. In that context, words like > "greedy" and "non-greedy" start to make sense, > because the behavior of any one operator influences which > match is encountered first. > > Either approach -- leftmost-longest or leftmost-first -- > can be implemented using finite automata. > > Russ > >