Carl Banks wrote: > EXI-Andrews, Jack wrote: >> the '*' and '+' don't seem to be greedy.. they will consume less in >> order to match a string: >> >>>>> import re;re.match('(a+)(ab)','aaab').groups() >> ('aa', 'ab') >> >> this is the sort of behaviour i'd expect from >> '(a+?)(ab)' >> >> a+ should greedily consume a's at the expense of the string not matching > > It's called backtracking. If a match fails, the regexp engine > recursively unwinds some of the greedy matching and tries again. > >> in real life, i am trying to match #defines: >> >>>>> re.match( '#define ([A-Za-z0-9_]+)([^(])', '#define >> abc(d)').groups() >> ('ab', 'c') >> >> i want this example to fail because the first character after a string >> of letters is a '(' >> i want to match only #defines without parameters. > > Probably the easiest thing to do is to attempt to match zero or one > opening parentheses, and bail out if it did end up matching the > parenthesis. > > m = re.match(r"#define ([A-Za-z0-9_]+)(\(?)","#define abc(d)") > if m and m.groups(2) != "(": > ... > > (BTW, it's good practice to use a raw strings for the regular > expressions as I did.) >
Another way that seems clearer to me is to use negative lookahead assertions. >>>defPatt = re.compile(r'#define\s+(\w+)\b(?!\()') >>> defPatt.match("#define abc(x)") >>> defPatt.match("#define foo_BarBaz") <_sre.SRE_Match object at 0xb7dd5820> >>> defPatt.match("#define foo_BarBaz").groups() ('foo_BarBaz',) In general '\w' is the same as [A-Za-z0-9_] There are other considerations too... I don't know if #define abc (x) But the main thing here is the use of '\b' to require there to be a word boundary at the end your match and a (?!\() to require that the match is not followed by a '(' see http://docs.python.org/lib/re-syntax.html noah >> so what's the definition of greedy? > > It means match as much you can *while getting a match*. > > Just remember regexp parsing is not a one way street: if it hits a dead > end, it'll go back and try another path. In fact, greediness usually > doesn't affect *whether* a regexp matches; it only affects the > groupings. I'm suddenly curious if there are any cases at all where > greediness changes whether it finds a match. > >> (pls copy me on responses) > > Ah, too bad you won't see it then.... > > > Carl Banks > -- http://mail.python.org/mailman/listinfo/python-list