Bugs item #1647489, was opened at 2007-01-29 17:35 Message generated for change (Settings changed) made by rhettinger You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1647489&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Regular Expressions Group: Python 2.5 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Jacques Frechet (jfrechet) >Assigned to: Gustavo Niemeyer (niemeyer) Summary: zero-length match confuses re.finditer() Initial Comment: Hi! re.finditer() seems to incorrectly increment the current position immediately after matching a zero-length substring. For example: >>> [m.groups() for m in re.finditer(r'(^z*)|(\w+)', 'abc')] [('', None), (None, 'bc')] What happened to the 'a'? I expected this result: [('', None), (None, 'abc')] Perl agrees with me: % perl -le 'print defined($1)?"\"$1\"":"undef",",",defined($2)?"\"$2\"":"undef" while "abc" =~ /(z*)|(\w+)/g' "",undef undef,"abc" "",undef Similarly, if I remove the ^: >>> [m.groups() for m in re.finditer(r'(z*)|(\w+)', 'abc')] [('', None), ('', None), ('', None), ('', None)] Now all of the letters have fallen through the cracks! I expected this result: [('', None), (None, 'abc'), ('', None)] Again, perl agrees: % perl -le 'print defined($1)?"\"$1\"":"undef",",",defined($2)?"\"$2\"":"undef" while "abc" =~ /(z*)|(\w+)/g' "",undef undef,"abc" "",undef If this bug has already been reported, I apologize -- I wasn't able to find it here. I haven't looked at the code for the re module, but this seems like the sort of bug that might have been accidentally introduced in order to try to prevent the same zero-length match from being returned forever. Thanks, Jacques ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1647489&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com