MonkeeSage wrote: > Licheng Fang wrote: > > Basically, the problem is this: > > > > >>> p = re.compile("do|dolittle") > > >>> p.match("dolittle").group() > > 'do' > > >From what I understand, this isn't python specific, it is the expected > behavior of that pattern in any implementation. You are using > alternation, which means "either, or", and you have the shorter > subexpression first, so the condition is satisfied by just 'do' and the > matching terminates. > > > There's another example: > > > > >>> p = re.compile("one(self)?(selfsufficient)?") > > >>> p.match("oneselfsufficient").group() > > 'oneself' > > Again, I don't think this has anything to do with python. You pattern > basically means "match 'one' whether it is followed by 'self' or not, > and whether it is followed by 'selfsufficient' or not". For this > particular example, you'd want something like > "one(self)?(sufficient)?". > > I think you could construct a pattern that would do what you want in > python without any problem. If you post a (short) example of your data, > I'm sure someone could help you with it. > > Regards, > Jordan
Hi, according to these regexp engine discussions, it's NOT a behavior true to any implementation. http://www.softec.st/en/OpenSource/DevelopersCorner/RegularExpressions/RegularExpressionEngines.html http://www.softec.st/en/OpenSource/DevelopersCorner/RegularExpressions/RegularExpressionEngines.html Python's NFA engine reads along the input string, matching it to the pattern, and backtracking when needed. By contrast a DFA engine, to my understanding, constructs a DFA and uses it to munch as many characters as possible. Maybe it's like this: Pattern: one(self)?(selfsufficient)? PYTHON'S NFA ENGINE: one self, none selfsufficient, none (start)------->((1))------------>((2))----------------------->((3)) DFA ENGINE: one self (start)------->((123))------------>((23)) | | | selfsufficient --------------->((3)) I want to know if there is some way to make Python RE behave like grep does, or do I have to change to another engine? -- http://mail.python.org/mailman/listinfo/python-list