from:"Joseph Copenhaver"

[issue9593] utf8 codec readlines error after "\x85 "

2010-08-13 Thread Joseph Copenhaver


New submission from Joseph Copenhaver :

The IO readlines() facility incorrectly processes utf8 files for some unknown 
reason. Specifically, the call generates too many entries in the lines array 
result after a character sequence "\x85 blah" which gets cut as ("\x85 
","blah") according the the resultant array. My workaround for this issue is 
not elegant, especially since I need the newline characters:

#BEGIN: WTF
a_str_whole = fs_in.read()
fs_in.close()
a_str_lines = a_str_whole.split("\n")
for idx in range(0,len(a_str_lines)-1):
   a_str_lines[idx]+="\n"
#END: WTF

Attached is an example script that defines the problem clearly.

--
components: IO, Interpreter Core, Regular Expressions, Unicode
files: ErrorProof-utf8-x85.py
messages: 113818
nosy: jcope
priority: normal
severity: normal
status: open
title: utf8 codec readlines error after "\x85 "
type: behavior
versions: Python 2.7
Added file: http://bugs.python.org/file18508/ErrorProof-utf8-x85.py

___
Python tracker 
<http://bugs.python.org/issue9593>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue9593] utf8 codec readlines error after "\x85 "

2010-08-13 Thread Joseph Copenhaver


Joseph Copenhaver  added the comment:

I now recognize the issue was in regard to format problems and not python, but 
the area where this code will be used requires the use of the codecs module.
Is there any way to get the efficiency of codecs I/O readlines() chunking 
behavior and specify a list of characters to use? Can the file delimiter be 
changed in python as in perl?

Thanks for the quick response.

--

___
Python tracker 
<http://bugs.python.org/issue9593>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue9593] utf8 codec readlines error after "\x85 "

2010-08-13 Thread Joseph Copenhaver


Joseph Copenhaver  added the comment:

It is better, thanks.

--

___
Python tracker 
<http://bugs.python.org/issue9593>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue9593] utf8 codec readlines error after "\x85 "

[issue9593] utf8 codec readlines error after "\x85 "

[issue9593] utf8 codec readlines error after "\x85 "

3 matches

Site Navigation

Mail list logo

Footer information