from:"Christoph Rauch"

[issue17090] io.TextIOWrapper does not handle UTF-8 encoded streams correctly

2013-01-31 Thread Christoph Rauch


New submission from Christoph Rauch:

I have uncovered a strange behavior in io.TextIOWrapper which I think is a bug.

#!/usr/bin/env python
# encoding: utf-8

import csv 
import io


  
raw_file = io.FileIO('utf-8-encoded.csv', 'rb')
stream = io.BufferedReader(raw_file)
stream = io.TextIOWrapper(stream, encoding="UTF-8")
reader = csv.reader(stream, delimiter=";")

cells = 0 

for row in reader:
# Cells should contain 4 Unicode characters.
assert all([len(cell.decode('utf-8')) == 4 for cell in row]), row 
cells += len(row)

assert cells == 210, cells

This produces a not very useful:

Traceback (most recent call last):
  File "utf8-textio-test.py", line 15, in 
for row in reader:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 1-4: 
ordinal not in range(128)

The only way to let it *not* crash is to set encoding to ascii and errors to 
ignore, but this clears out all the characters with ord>128, clearly not useful 
as well, so I hope this behavior is not intended.

I appended a file with which to test this problem.

--
components: IO
files: utf-8-encoded.csv
messages: 181028
nosy: Christoph.Rauch
priority: normal
severity: normal
status: open
title: io.TextIOWrapper does not handle UTF-8 encoded streams correctly
type: behavior
versions: Python 2.7
Added file: http://bugs.python.org/file28922/utf-8-encoded.csv

___
Python tracker 
<http://bugs.python.org/issue17090>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue17090] io.TextIOWrapper does not handle UTF-8 encoded streams correctly

2013-01-31 Thread Christoph Rauch


Christoph Rauch added the comment:

Thanks for the information. Will work around that. Miss-diagnosed the problem.

--

___
Python tracker 
<http://bugs.python.org/issue17090>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue17090] io.TextIOWrapper does not handle UTF-8 encoded streams correctly

[issue17090] io.TextIOWrapper does not handle UTF-8 encoded streams correctly

2 matches

Site Navigation

Mail list logo

Footer information