[issue17668] re.split loses characters matching ungrouped parts of a pattern

Tomasz J. Kotarba Mon, 08 Apr 2013 11:20:38 -0700

New submission from Tomasz J. Kotarba:

Tested in 2.7 but possibly affects the other versions as well.


A real life example (note the first character '>' being lost):

>>> import re
>>> re.split(r'^>(.*)$', '>Homo sapiens catenin (cadherin-associated)')

produces:

['', 'Homo sapiens catenin (cadherin-associated)', '']


Expected (and IMHO most useful) behaviour would be for it to return:

['', '>Homo sapiens catenin (cadherin-associated)', '']

or (IMHO much less useful as one can already get this one just by adding 
external grouping parentheses and it is ):

['', '>Homo sapiens catenin (cadherin-associated)', 'Homo sapiens catenin 
(cadherin-associated)', '']

Not sure whether it can be changed in such a mature and widely used module 
without breaking compatibility but just adding a new optional parameter for 
deciding how re.split() deals with patterns containing grouping parentheses and 
making it default to the current behaviour would be very helpful.
Best Regards

----------
components: Regular Expressions
messages: 186324
nosy: ezio.melotti, mrabarnett, triquetra011
priority: normal
severity: normal
status: open
title: re.split loses characters matching ungrouped parts of a pattern
type: behavior
versions: Python 2.7

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue17668>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue17668] re.split loses characters matching ungrouped parts of a pattern

Reply via email to