[issue36397] re.split() incorrectly splitting on zero-width pattern

2019-03-23 Thread Matthew Barnett
Matthew Barnett added the comment: The list alternates between substrings (s, between the splits) and captures (c): ['1', '1', '2', '2', '11'] -s- -c- -s- -c- -s-- You can use slicing to extract the substrings: >>> re.split(r'(?<=(\d))(?!\1)(?=\d)', '12111')[ : : 2] ['1', '2', '111'] -

[issue36397] re.split() incorrectly splitting on zero-width pattern

2019-03-23 Thread Elias Tarhini
Elias Tarhini added the comment: Thank you. Was too zeroed-in on the idea that it was from the zero-width pattern, and I forgot to consider the group. Looks like `re.sub(pattern, 'some-delim', s).split('some-delim')` is a way to do this if it's not possible to use a non-capturing group

[issue36397] re.split() incorrectly splitting on zero-width pattern

2019-03-21 Thread Matthew Barnett
Matthew Barnett added the comment: >From the docs: """If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list.""" The pattern does contain a capture, so that's why the result has additional '1' and '2'. Presum

[issue36397] re.split() incorrectly splitting on zero-width pattern

2019-03-21 Thread Elias Tarhini
New submission from Elias Tarhini : I believe I've found a bug in the `re` module -- specifically, in the 3.7+ support for splitting on zero-width patterns. Compare Java's behavior... jshell> "1211".split("(?<=(\\d))(?!\\1)(?=\\d)"); $1 ==> String[3] { "1", "2", "11" } ...with Python'