[issue24601] bytes and unicode splitlines() methods differ on what is a line break

2015-07-10 Thread Gregory P. Smith
Gregory P. Smith added the comment: hah, i should've searched the tracker first. looks like the other open issues cover this. -- resolution: -> duplicate status: open -> closed superseder: -> str.splitlines splitting on non-\r\n characters versions: +Python 2.7, Python 3.4, Python 3.

[issue24601] bytes and unicode splitlines() methods differ on what is a line break

2015-07-10 Thread Martin Panter
Martin Panter added the comment: * Issue 7643: Originally a complaint about the difference, but was closed after adding more differences! * Issue 22232: Documentation bug, but with some discussion on changing the API. Maybe a duplicate? * Issue 22233: Email and HTTP message parsing bug related

[issue24601] bytes and unicode splitlines() methods differ on what is a line break

2015-07-09 Thread Steven D'Aprano
Steven D'Aprano added the comment: On Fri, Jul 10, 2015 at 02:18:33AM +, Gregory P. Smith wrote: > for bytes, \v (0x0b) is not considered a line break. for unicode, it is. [...] > I think these should be consistent. I'm not sure that they should. Unicode includes other line breaks which b

[issue24601] bytes and unicode splitlines() methods differ on what is a line break

2015-07-09 Thread Gregory P. Smith
New submission from Gregory P. Smith: for bytes, \v (0x0b) is not considered a line break. for unicode, it is. this traces back to the Objects/stringlib/ code where unicode defers to the decision made by Objects/unicodeobject.c's ascii_linebreak table which contains 7 line breaks in the 0..12