EXI-Andrews, Jack wrote: > the '*' and '+' don't seem to be greedy.. they will consume less in > order to match a string: > > >>> import re;re.match('(a+)(ab)','aaab').groups() > ('aa', 'ab') > > this is the sort of behaviour i'd expect from > '(a+?)(ab)' > > a+ should greedily consume a's at the expense of the string not matching
It's called backtracking. If a match fails, the regexp engine recursively unwinds some of the greedy matching and tries again. > in real life, i am trying to match #defines: > > >>> re.match( '#define ([A-Za-z0-9_]+)([^(])', '#define > abc(d)').groups() > ('ab', 'c') > > i want this example to fail because the first character after a string > of letters is a '(' > i want to match only #defines without parameters. Probably the easiest thing to do is to attempt to match zero or one opening parentheses, and bail out if it did end up matching the parenthesis. m = re.match(r"#define ([A-Za-z0-9_]+)(\(?)","#define abc(d)") if m and m.groups(2) != "(": ... (BTW, it's good practice to use a raw strings for the regular expressions as I did.) > so what's the definition of greedy? It means match as much you can *while getting a match*. Just remember regexp parsing is not a one way street: if it hits a dead end, it'll go back and try another path. In fact, greediness usually doesn't affect *whether* a regexp matches; it only affects the groupings. I'm suddenly curious if there are any cases at all where greediness changes whether it finds a match. > (pls copy me on responses) Ah, too bad you won't see it then.... Carl Banks -- http://mail.python.org/mailman/listinfo/python-list