And Clover <a...@doxdesk.com> added the comment: > The problem is that codecs.open() forces binary mode on the underlying file object, and this defeats the U mode.
Actually the problem is it doesn't defeat it! The function is documented to force binary, but it actually only does "mode = mode + 'b'", which can leave you with a mode of 'rUb'. This mode should be invalid but in practice the 'U' wins out, and causes the expected problems for UTF-16 and some East Asian codecs. Until such time as text/universal mode is supported at the overlying decoded stream level, I suggest that 'U' should be .replace()d out of the mode as well as 'b' being added, as the documentation would imply. ---------- nosy: +aclover _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue691291> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com