Bugs item #1647489, was opened at 2007-01-29 22:35
Message generated for change (Settings changed) made by gbrandl
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1647489&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Regular Expressions
Group: Python 2.5
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Jacques Frechet (jfrechet)
>Assigned to: Nobody/Anonymous (nobody)
Summary: zero-length match confuses re.finditer()

Initial Comment:
Hi!

re.finditer() seems to incorrectly increment the current position immediately 
after matching a zero-length substring.  For example:

>>> [m.groups() for m in re.finditer(r'(^z*)|(\w+)', 'abc')]
[('', None), (None, 'bc')]

What happened to the 'a'?  I expected this result:

[('', None), (None, 'abc')]

Perl agrees with me:

% perl -le 'print defined($1)?"\"$1\"":"undef",",",defined($2)?"\"$2\"":"undef" 
while "abc" =~ /(z*)|(\w+)/g' 
"",undef
undef,"abc"
"",undef

Similarly, if I remove the ^:

>>> [m.groups() for m in re.finditer(r'(z*)|(\w+)', 'abc')]
[('', None), ('', None), ('', None), ('', None)]

Now all of the letters have fallen through the cracks!  I expected this result:

[('', None), (None, 'abc'), ('', None)]

Again, perl agrees:

% perl -le 'print defined($1)?"\"$1\"":"undef",",",defined($2)?"\"$2\"":"undef" 
while "abc" =~ /(z*)|(\w+)/g' 
"",undef
undef,"abc"
"",undef

If this bug has already been reported, I apologize -- I wasn't able to find it 
here.  I haven't looked at the code for the re module, but this seems like the 
sort of bug that might have been accidentally introduced in order to try to 
prevent the same zero-length match from being returned forever.

Thanks,
Jacques

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1647489&group_id=5470
_______________________________________________
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to