New submission from Tomasz J. Kotarba: Tested in 2.7 but possibly affects the other versions as well.
A real life example (note the first character '>' being lost): >>> import re >>> re.split(r'^>(.*)$', '>Homo sapiens catenin (cadherin-associated)') produces: ['', 'Homo sapiens catenin (cadherin-associated)', ''] Expected (and IMHO most useful) behaviour would be for it to return: ['', '>Homo sapiens catenin (cadherin-associated)', ''] or (IMHO much less useful as one can already get this one just by adding external grouping parentheses and it is ): ['', '>Homo sapiens catenin (cadherin-associated)', 'Homo sapiens catenin (cadherin-associated)', ''] Not sure whether it can be changed in such a mature and widely used module without breaking compatibility but just adding a new optional parameter for deciding how re.split() deals with patterns containing grouping parentheses and making it default to the current behaviour would be very helpful. Best Regards ---------- components: Regular Expressions messages: 186324 nosy: ezio.melotti, mrabarnett, triquetra011 priority: normal severity: normal status: open title: re.split loses characters matching ungrouped parts of a pattern type: behavior versions: Python 2.7 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue17668> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com