Re: csv and mixed lists of unicode and numbers

Terry Reedy Tue, 24 Nov 2009 15:03:13 -0800

Sibylle Koczian wrote:

Peter Otten schrieb:

I'd preprocess the rows as I tend to prefer the simplest approach I can comeup with. Example:


def recode_rows(rows, source_encoding, target_encoding):
    def recode(field):
        if isinstance(field, unicode):
            return field.encode(target_encoding)
        elif isinstance(field, str):
            return unicode(field, source_encoding).encode(target_encoding)
        return unicode(field).encode(target_encoding)

    return (map(recode, row) for row in rows)


For this case isinstance really seems to be quite reasonable. And it was
silly of me not to think of sys.stdout as file object for the example!

rows = [[1.23], [u"äöü"], [u"ÄÖÜ".encode("latin1")], [1, 2, 3]]
writer = csv.writer(sys.stdout)
writer.writerows(recode_rows(rows, "latin1", "utf-8"))
The only limitation I can see: target_encoding probably has to be a supersetof ASCII.


Coping with umlauts and accents is quite enough for me.

This problem really goes away with Python 3 (tried it on another
machine), but something else changes too: in Python 2.6 the
documentation for the csv module explicitly says "If csvfile is a file
object, it must be opened with the ‘b’ flag on platforms where that
makes a difference." The documentation for Python 3.1 doesn't have this
sentence, and if I do that in Python 3.1 I get for all sorts of data,
even for a list with only one integer literal:

TypeError: must be bytes or buffer, not str

I don't really understand that.

In Python 3, a file opened in 'b' mode is for reading and writing byteswith no encoding/decoding. I believe cvs works with files in text modeas it returns and expects strings/text for reading and writing. Perhapsthe cvs doc should say must not be opened in 'b' mode. Not sure.


tjr

--
http://mail.python.org/mailman/listinfo/python-list

Re: csv and mixed lists of unicode and numbers

Reply via email to