Sibylle Koczian wrote:
Peter Otten schrieb:
I'd preprocess the rows as I tend to prefer the simplest approach I can come
up with. Example:
def recode_rows(rows, source_encoding, target_encoding):
def recode(field):
if isinstance(field, unicode):
return field.encode(target_encoding)
elif isinstance(field, str):
return unicode(field, source_encoding).encode(target_encoding)
return unicode(field).encode(target_encoding)
return (map(recode, row) for row in rows)
For this case isinstance really seems to be quite reasonable. And it was
silly of me not to think of sys.stdout as file object for the example!
rows = [[1.23], [u"äöü"], [u"ÄÖÜ".encode("latin1")], [1, 2, 3]]
writer = csv.writer(sys.stdout)
writer.writerows(recode_rows(rows, "latin1", "utf-8"))
The only limitation I can see: target_encoding probably has to be a superset
of ASCII.
Coping with umlauts and accents is quite enough for me.
This problem really goes away with Python 3 (tried it on another
machine), but something else changes too: in Python 2.6 the
documentation for the csv module explicitly says "If csvfile is a file
object, it must be opened with the ‘b’ flag on platforms where that
makes a difference." The documentation for Python 3.1 doesn't have this
sentence, and if I do that in Python 3.1 I get for all sorts of data,
even for a list with only one integer literal:
TypeError: must be bytes or buffer, not str
I don't really understand that.
In Python 3, a file opened in 'b' mode is for reading and writing bytes
with no encoding/decoding. I believe cvs works with files in text mode
as it returns and expects strings/text for reading and writing. Perhaps
the cvs doc should say must not be opened in 'b' mode. Not sure.
tjr
--
http://mail.python.org/mailman/listinfo/python-list