On Oct 4, 9:34 pm, Wolfgang Rohdewald <wolfg...@rohdewald.de> wrote: > Hi, > > I want to match a string only if a word (C1 in this example) appears > at most once in it. This is what I tried: > > >>> re.match(r'(.*?C1)((?!.*C1))','C1b1b1b1 b3b3b3b3 C1C2C3').groups() > > ('C1b1b1b1 b3b3b3b3 C1', '')>>> re.match(r'(.*?C1)','C1b1b1b1 b3b3b3b3 > C1C2C3').groups() > > ('C1',) > > but this should not have matched. Why is the .*? behaving greedy > if followed by (?!.*C1)?
It's not. > I would have expected that re first > evaluates (.*?C1) before proceeding at all. It does. What you're not realizing is that if a regexp search comes to a dead end, it won't simply return "no match". Instead it'll throw away part of the match, and backtrack to a previously-matched variable-length subexpression, such as ".*?", and try again with a different length. That's what happened above. At first the group "(.*?C1)" non-greedily matched the substring "C1", but it couldn't find a match under those circumstances, so it backtracked to the ".*?". and looked a longer match, which it found. Here's something to keep in mind: except for a few corner cases, greedy versus non-greedy will not affect the substring matched, it'll only affect the groups. > I also tried: > > >>> re.search(r'(.*?C1(?!.*C1))','C1b1b1b1 b3b3b3b3 > > C1C2C3C4').groups() > ('C1b1b1b1 b3b3b3b3 C1',) > > with the same problem. > > How could this be done? Can't be done with regexps. How you would do this kind of depends on your overall goals, but your first look should be toward the string methods. If you share details with us we can help you choose a better strategy. Carl Banks -- http://mail.python.org/mailman/listinfo/python-list