New submission from Joseph Copenhaver <joseph.copenha...@gmail.com>:
The IO readlines() facility incorrectly processes utf8 files for some unknown reason. Specifically, the call generates too many entries in the lines array result after a character sequence "\x85 blah" which gets cut as ("\x85 ","blah") according the the resultant array. My workaround for this issue is not elegant, especially since I need the newline characters: #BEGIN: WTF a_str_whole = fs_in.read() fs_in.close() a_str_lines = a_str_whole.split("\n") for idx in range(0,len(a_str_lines)-1): a_str_lines[idx]+="\n" #END: WTF Attached is an example script that defines the problem clearly. ---------- components: IO, Interpreter Core, Regular Expressions, Unicode files: ErrorProof-utf8-x85.py messages: 113818 nosy: jcope priority: normal severity: normal status: open title: utf8 codec readlines error after "\x85 " type: behavior versions: Python 2.7 Added file: http://bugs.python.org/file18508/ErrorProof-utf8-x85.py _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue9593> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com