Eryk Sun <eryk...@gmail.com> added the comment:

> cp65001 is *not* utf-8: Microsoft decided to handle surrogates 
> differently for some reasons.

Do you mean valid UTF-16 surrogate pairs? For example:

    >>> codecs.code_page_encode(65001, '\ud800\udc00')
    (b'\xf0\x90\x80\x80', 2)

PyUnicode_AsUnicodeAndSize is neutral about storing surrogate codes in a 16-bit 
wchar_t string. In particular, the Python string in this case contains two 
surrogate codes, but they're passed to WideCharToMultiByte as a UTF-16 
surrogate pair for the single character U+10000.

Anyway, it seems to me this issue will be resolved if cp65001.py is rewritten 
without functools.partial.

----------
nosy: +eryksun

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue36778>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to