[issue36397] re.split() incorrectly splitting on zero-width pattern

2019-03-21 Thread Elias Tarhini


New submission from Elias Tarhini :

I believe I've found a bug in the `re` module -- specifically, in the 3.7+ 
support for splitting on zero-width patterns. Compare Java's behavior...

jshell> "1211".split("(?<=(\\d))(?!\\1)(?=\\d)");
$1 ==> String[3] { "1", "2", "11" }

...with Python's:

>>> re.split(r'(?<=(\d))(?!\1)(?=\d)', '1211')
['1', '1', '2', '2', '11']

(The pattern itself is pretty straightforward in design, but regex syntax can 
cloud things, so to be totally clear: it finds any point that follows a digit 
and precedes a *different* digit.)

* Tested on 3.7.1 win10 and 3.7.0 linux.

--
components: Regular Expressions
messages: 338581
nosy: Elias Tarhini, ezio.melotti, mrabarnett
priority: normal
severity: normal
status: open
title: re.split() incorrectly splitting on zero-width pattern
type: behavior
versions: Python 3.7

___
Python tracker 
<https://bugs.python.org/issue36397>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36397] re.split() incorrectly splitting on zero-width pattern

2019-03-23 Thread Elias Tarhini


Elias Tarhini  added the comment:

Thank you. Was too zeroed-in on the idea that it was from the zero-width 
pattern, and I forgot to consider the group. Looks like `re.sub(pattern, 
'some-delim', s).split('some-delim')` is a way to do this if it's not possible 
to use a non-capturing group

--

___
Python tracker 
<https://bugs.python.org/issue36397>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com