New submission from Tomasz J. Kotarba:

Tested in 2.7 but possibly affects the other versions as well.

A real life example (note the first character '>' being lost):

>>> import re
>>> re.split(r'^>(.*)$', '>Homo sapiens catenin (cadherin-associated)')


['', 'Homo sapiens catenin (cadherin-associated)', '']

Expected (and IMHO most useful) behaviour would be for it to return:

['', '>Homo sapiens catenin (cadherin-associated)', '']

or (IMHO much less useful as one can already get this one just by adding 
external grouping parentheses and it is ):

['', '>Homo sapiens catenin (cadherin-associated)', 'Homo sapiens catenin 
(cadherin-associated)', '']

Not sure whether it can be changed in such a mature and widely used module 
without breaking compatibility but just adding a new optional parameter for 
deciding how re.split() deals with patterns containing grouping parentheses and 
making it default to the current behaviour would be very helpful.
Best Regards

components: Regular Expressions
messages: 186324
nosy: ezio.melotti, mrabarnett, triquetra011
priority: normal
severity: normal
status: open
title: re.split loses characters matching ungrouped parts of a pattern
type: behavior
versions: Python 2.7

Python tracker <>
Python-bugs-list mailing list

Reply via email to