New submission from Elias Tarhini :
I believe I've found a bug in the `re` module -- specifically, in the 3.7+
support for splitting on zero-width patterns. Compare Java's behavior...
jshell> "1211".split("(?<=(\\d))(?!\\1)(?=\\d)");
$1 ==> String[3] { "1", "2", "11" }
...with Python's:
>>> re.split(r'(?<=(\d))(?!\1)(?=\d)', '1211')
['1', '1', '2', '2', '11']
(The pattern itself is pretty straightforward in design, but regex syntax can
cloud things, so to be totally clear: it finds any point that follows a digit
and precedes a *different* digit.)
* Tested on 3.7.1 win10 and 3.7.0 linux.
--
components: Regular Expressions
messages: 338581
nosy: Elias Tarhini, ezio.melotti, mrabarnett
priority: normal
severity: normal
status: open
title: re.split() incorrectly splitting on zero-width pattern
type: behavior
versions: Python 3.7
___
Python tracker
<https://bugs.python.org/issue36397>
___
___
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com