New submission from Joseph Copenhaver <joseph.copenha...@gmail.com>:

The IO readlines() facility incorrectly processes utf8 files for some unknown 
reason. Specifically, the call generates too many entries in the lines array 
result after a character sequence "\x85 blah" which gets cut as ("\x85 
","blah") according the the resultant array. My workaround for this issue is 
not elegant, especially since I need the newline characters:

#BEGIN: WTF
a_str_whole = fs_in.read()
fs_in.close()
a_str_lines = a_str_whole.split("\n")
for idx in range(0,len(a_str_lines)-1):
   a_str_lines[idx]+="\n"
#END: WTF

Attached is an example script that defines the problem clearly.

----------
components: IO, Interpreter Core, Regular Expressions, Unicode
files: ErrorProof-utf8-x85.py
messages: 113818
nosy: jcope
priority: normal
severity: normal
status: open
title: utf8 codec readlines error after "\x85 "
type: behavior
versions: Python 2.7
Added file: http://bugs.python.org/file18508/ErrorProof-utf8-x85.py

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue9593>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to