New submission from beardypig <beardy...@protonmail.com>:

I am experiencing and issue with the following regex when using finditer. 

    (?=<(?P<tag>\w+)/?>(?:(?P<text>.+?)</(?P=tag)>)?)", "<test><foo2/></test>

(I know it's not the best method of dealing with HTML, and this is a simplified 
version)

For example:

    [m.groupdict() for m in 
re.finditer(r"(?=<(?P<tag>\w+)/?>(?:(?P<text>.+?)</(?P=tag)>)?)", 
"<test><foo2/></test>")]

In Python 2.7, 3.5, and 3.6 it returns

    [{'tag': 'test', 'text': '<foo2/>'}, {'tag': 'foo2', 'text': None}]

But starting with 3.7 it returns

    [{'tag': 'test', 'text': '<foo2/>'}, {'tag': 'foo2', 'text': '<foo2/>'}]

The "text" group appears to be a copy of the previous "text" group.


Some other examples:

    "<test>Hello</test><foo/>" => [{'tag': 'test', 'text': 'Hello'}, {'tag': 
'foo', 'text': 'Hello'}] (expected: [{'tag': 'test', 'text': 'Hello'}, {'tag': 
'foo', 'text': None}])
    "<test>Hello</test><foo/><foo/>" => [{'tag': 'test', 'text': 'Hello'}, 
{'tag': 'foo', 'text': 'Hello'}, {'tag': 'foo', 'text': None}] (expected: 
[{'tag': 'test', 'text': 'Hello'}, {'tag': 'foo', 'text': None}, {'tag': 'foo', 
'text': None}])

----------
components: Regular Expressions
messages: 322771
nosy: beardypig, ezio.melotti, mrabarnett
priority: normal
severity: normal
status: open
title: re.finditer and lookahead bug
type: behavior
versions: Python 3.7, Python 3.8

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue34294>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to