[issue40762] Writing bytes using CSV module results in b prefixed strings

Steven D'Aprano Tue, 26 May 2020 20:02:03 -0700

Steven D'Aprano <steve+pyt...@pearwood.info> added the comment:

On further thought, no, I don't think it would be a reasonable feature.


User opens the CSV file, probably using the default encoding (UTF-8?) 
but potentially in anything.

They collect some data as bytes. Those bytes could be from any unknown 
encoding. When they try writing those bytes to the CSV file, at best 
they get an explicit but confusing exception that the decoding failed, 
at worst they get data loss (mojibake).

    # Latin-1 to UTF-8 fails
    py> b = 'ßæ'.encode('latin-1')
    py> b.decode('utf-8')
    # raises UnicodeDecodeError: 'utf-8' codec can't decode 
    # byte 0xdf in position 0: invalid continuation byte

    # UTF-8 to Latin-1 loses data
    py> b = 'ßæ'.encode('UTF-8')
    py> b.decode('latin-1')
    # returns mojibake 'Ã\x9fÃ¦'

Short of outright banning the use of bytes (raise a TypeError), I think 
the current behaviour is least-worst.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue40762>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue40762] Writing bytes using CSV module results in b prefixed strings

Reply via email to