STINNER Victor <vstin...@redhat.com> added the comment:
Victor: > cp65001 is *not* utf-8: Microsoft decided to handle surrogates differently > for some reasons. Eryk: > Do you mean valid UTF-16 surrogate pairs? (...) Code page 65001 handles lone surrogate differently on Windows XP and older. It changed in Windows Vista: https://unicodebook.readthedocs.io/operating_systems.html#encode-and-decode-functions Steve Dower removed support for Vista from test_codecs.py 3 years ago: commit f5aba58480bb0dd45181f609487ac2ecfcc98673 Author: Steve Dower <steve.do...@microsoft.com> Date: Tue Sep 6 19:42:27 2016 -0700 Issue #27959: Adds oem encoding, alias ansi to mbcs, move aliasmbcs to codec lookup Maybe it's time to remove Lib/encodings/cp65001.py and add an alias cp65001 => utf_8 in Lib/encodings/aliases.py? See bpo-32592. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue36778> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com