Matthew Barnett added the comment:
The list alternates between substrings (s, between the splits) and captures (c):
['1', '1', '2', '2', '11']
-s- -c- -s- -c- -s--
You can use slicing to extract the substrings:
>>> re.split(r'(?<=(\d))(?!\1)(?=\d)', '12111')[ : : 2]
['1', '2', '111']
-
Elias Tarhini added the comment:
Thank you. Was too zeroed-in on the idea that it was from the zero-width
pattern, and I forgot to consider the group. Looks like `re.sub(pattern,
'some-delim', s).split('some-delim')` is a way to do this if it's not possible
to use a non-capturing group
Matthew Barnett added the comment:
>From the docs:
"""If capturing parentheses are used in pattern, then the text of all groups in
the pattern are also returned as part of the resulting list."""
The pattern does contain a capture, so that's why the result has additional '1'
and '2'.
Presum
New submission from Elias Tarhini :
I believe I've found a bug in the `re` module -- specifically, in the 3.7+
support for splitting on zero-width patterns. Compare Java's behavior...
jshell> "1211".split("(?<=(\\d))(?!\\1)(?=\\d)");
$1 ==> String[3] { "1", "2", "11" }
...with Python'