[issue33928] _Py_DecodeUTF8Ex() creates surrogate pairs on Windows

2018-06-21 Thread STINNER Victor
STINNER Victor added the comment: > I don't see anything wrong. I write a C function to test _Py_DecodeUTF8Ex(): * surrogateescape=0 fails with a decoding error as expected * surrogateescape=1 escapes the bytes as expected as: '\udced\udcb2\udc80' Ok, I just misunderstood the code: the decod

[issue33928] _Py_DecodeUTF8Ex() creates surrogate pairs on Windows

2018-06-21 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: I don't see anything wrong. -- ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubsc

[issue33928] _Py_DecodeUTF8Ex() creates surrogate pairs on Windows

2018-06-21 Thread STINNER Victor
STINNER Victor added the comment: > Could you show an example please? I saw an issue when reading the code, I didn't try to trigger the issue using real code yet. -- ___ Python tracker

[issue33928] _Py_DecodeUTF8Ex() creates surrogate pairs on Windows

2018-06-21 Thread STINNER Victor
STINNER Victor added the comment: Extract of _Py_DecodeUTF8Ex() code, there is an explicit "write a surrogate pair" comment: #if SIZEOF_WCHAR_T == 4 ch = ucs4lib_utf8_decode(&s, e, (Py_UCS4 *)unicode, &outpos); #else ch = ucs2lib_utf8_decode(&s, e, (Py_UCS2 *)unicode, &outpos)

[issue33928] _Py_DecodeUTF8Ex() creates surrogate pairs on Windows

2018-06-21 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Could you show an example please? -- ___ Python tracker ___ ___ Python-bugs-list mailing list U

[issue33928] _Py_DecodeUTF8Ex() creates surrogate pairs on Windows

2018-06-21 Thread STINNER Victor
New submission from STINNER Victor : _Py_DecodeUTF8Ex() creates surrogate pairs with 16-bit wchar_t (on Windows), whereas input bytes should be escaped. I'm quite sure that it's a bug. -- components: Interpreter Core messages: 320154 nosy: serhiy.storchaka, vstinner priority: normal se