New submission from Joseph Copenhaver :
The IO readlines() facility incorrectly processes utf8 files for some unknown
reason. Specifically, the call generates too many entries in the lines array
result after a character sequence "\x85 blah" which gets cut as ("\x85
","blah") according the the resultant array. My workaround for this issue is
not elegant, especially since I need the newline characters:
#BEGIN: WTF
a_str_whole = fs_in.read()
fs_in.close()
a_str_lines = a_str_whole.split("\n")
for idx in range(0,len(a_str_lines)-1):
a_str_lines[idx]+="\n"
#END: WTF
Attached is an example script that defines the problem clearly.
--
components: IO, Interpreter Core, Regular Expressions, Unicode
files: ErrorProof-utf8-x85.py
messages: 113818
nosy: jcope
priority: normal
severity: normal
status: open
title: utf8 codec readlines error after "\x85 "
type: behavior
versions: Python 2.7
Added file: http://bugs.python.org/file18508/ErrorProof-utf8-x85.py
___
Python tracker
<http://bugs.python.org/issue9593>
___
___
Python-bugs-list mailing list
Unsubscribe:
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com