New submission from beardypig <beardy...@protonmail.com>:
I am experiencing and issue with the following regex when using finditer. (?=<(?P<tag>\w+)/?>(?:(?P<text>.+?)</(?P=tag)>)?)", "<test><foo2/></test> (I know it's not the best method of dealing with HTML, and this is a simplified version) For example: [m.groupdict() for m in re.finditer(r"(?=<(?P<tag>\w+)/?>(?:(?P<text>.+?)</(?P=tag)>)?)", "<test><foo2/></test>")] In Python 2.7, 3.5, and 3.6 it returns [{'tag': 'test', 'text': '<foo2/>'}, {'tag': 'foo2', 'text': None}] But starting with 3.7 it returns [{'tag': 'test', 'text': '<foo2/>'}, {'tag': 'foo2', 'text': '<foo2/>'}] The "text" group appears to be a copy of the previous "text" group. Some other examples: "<test>Hello</test><foo/>" => [{'tag': 'test', 'text': 'Hello'}, {'tag': 'foo', 'text': 'Hello'}] (expected: [{'tag': 'test', 'text': 'Hello'}, {'tag': 'foo', 'text': None}]) "<test>Hello</test><foo/><foo/>" => [{'tag': 'test', 'text': 'Hello'}, {'tag': 'foo', 'text': 'Hello'}, {'tag': 'foo', 'text': None}] (expected: [{'tag': 'test', 'text': 'Hello'}, {'tag': 'foo', 'text': None}, {'tag': 'foo', 'text': None}]) ---------- components: Regular Expressions messages: 322771 nosy: beardypig, ezio.melotti, mrabarnett priority: normal severity: normal status: open title: re.finditer and lookahead bug type: behavior versions: Python 3.7, Python 3.8 _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue34294> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com