Eryk Sun <eryk...@gmail.com> added the comment:
In Unix, Python 3.6 decodes the char * command line arguments via mbstowcs. In Linux, I see the following misbehavior of mbstowcs when decoding an overlong UTF-8 sequence: >>> mbstowcs = ctypes.CDLL(None, use_errno=True).mbstowcs >>> arg = bytes(x + 128 for x in [1 + 124, 63, 63, 59, 58, 58]) >>> mbstowcs(None, arg, 0) 1 >>> buf = (ctypes.c_int * 2)() >>> mbstowcs(buf, arg, 2) 1 >>> hex(buf[0]) '0x7fffbeba' This shouldn't be an issue in 3.7, at least not with the default UTF-8 mode configuration. With this mode, Py_DecodeLocale calls _Py_DecodeUTF8Ex using the surrogateescape error handler [1]. [1]: https://github.com/python/cpython/blob/v3.7.2/Python/fileutils.c#L456 ---------- nosy: +eryksun _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue35883> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com