Marc-Andre Lemburg <m...@egenix.com> added the comment: STINNER Victor wrote: > > New submission from STINNER Victor <victor.stin...@haypocalc.com>: > > It would be nice to support PEP 383 (surrogateescape) on Windows, but the > mbcs codec doesn't support it for performance reason. The Windows functions > to encode/decode MBCS don't give the index of the unencodable/undecodable > character/byte. For encoding, we can try to encode character by character > (but be careful of surrogate pairs) and check that the character is a Python > lone surrogate character or not (character in range U+DC80..U+DCFF). For > decoding, it is more complex because MBCS can be a multibyte encoding, eg. > cp65001 (Microsoft variant of utf-8, see #6058). So it's not possible to > encode byte per byte and we should write an heuristic to guess the right > number of bytes for each call to the decode function. > > -- > > A completly different solution is to get the MBCS code page and use the > Python code page codec (eg. "cp1252") instead of "mbcs" encoding, because > Python cpXXXX codecs support all Python error handlers. Example (with Python > 2.6): > >>>> print(u"abcŁdef".encode("cp1252", "replace")) > abc?def >>>> print(u"abcŁdef".encode("cp1252", "ignore")) > abcdef >>>> print(u"abcŁdef".encode("cp1252", "backslashreplace")) > abc\u0141def
That would certainly be a better approach, provided that our cp-encodings are indeed compatible with the Windows variants (which unfortunately tend to often use slightly different mappings). We could then also alias 'mbcs' to the cp-encoding (sort of like the reverse of what we do in site.py:aliasmbcs(). ---------- nosy: +lemburg title: Support PEP 383 on Windows: mbcs support of surrogateescape error handler -> Support PEP 383 on Windows: mbcs support of surrogateescape error handler _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue9821> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com