Eryk Sun <eryk...@gmail.com> added the comment:
> cp65001 is *not* utf-8: Microsoft decided to handle surrogates > differently for some reasons. Do you mean valid UTF-16 surrogate pairs? For example: >>> codecs.code_page_encode(65001, '\ud800\udc00') (b'\xf0\x90\x80\x80', 2) PyUnicode_AsUnicodeAndSize is neutral about storing surrogate codes in a 16-bit wchar_t string. In particular, the Python string in this case contains two surrogate codes, but they're passed to WideCharToMultiByte as a UTF-16 surrogate pair for the single character U+10000. Anyway, it seems to me this issue will be resolved if cp65001.py is rewritten without functools.partial. ---------- nosy: +eryksun _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue36778> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com