[issue691291] codecs.open(filename, 'U', 'UTF-16') corrupts text

And Clover Wed, 04 Feb 2009 17:42:27 -0800

And Clover <[email protected]> added the comment:

> The problem is that codecs.open() forces binary mode on the underlying
file object, and this defeats the U mode.


Actually the problem is it doesn't defeat it!

The function is documented to force binary, but it actually only does
"mode = mode + 'b'", which can leave you with a mode of 'rUb'. This mode
should be invalid but in practice the 'U' wins out, and causes the
expected problems for UTF-16 and some East Asian codecs.

Until such time as text/universal mode is supported at the overlying
decoded stream level, I suggest that 'U' should be .replace()d out of
the mode as well as 'b' being added, as the documentation would imply.

----------
nosy: +aclover

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue691291>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue691291] codecs.open(filename, 'U', 'UTF-16') corrupts text

Reply via email to