Marc-Andre Lemburg <m...@egenix.com> added the comment:

STINNER Victor wrote:
> 
> New submission from STINNER Victor <victor.stin...@haypocalc.com>:
> 
> It would be nice to support PEP 383 (surrogateescape) on Windows, but the 
> mbcs codec doesn't support it for performance reason. The Windows functions 
> to encode/decode MBCS don't give the index of the unencodable/undecodable 
> character/byte. For encoding, we can try to encode character by character 
> (but be careful of surrogate pairs) and check that the character is a Python 
> lone surrogate character or not (character in range U+DC80..U+DCFF). For 
> decoding, it is more complex because MBCS can be a multibyte encoding, eg. 
> cp65001 (Microsoft variant of utf-8, see #6058). So it's not possible to 
> encode byte per byte and we should write an heuristic to guess the right 
> number of bytes for each call to the decode function.
> 
> --
> 
> A completly different solution is to get the MBCS code page and use the 
> Python code page codec (eg. "cp1252") instead of "mbcs" encoding, because 
> Python cpXXXX codecs support all Python error handlers. Example (with Python 
> 2.6):
> 
>>>> print(u"abcŁdef".encode("cp1252", "replace"))
> abc?def
>>>> print(u"abcŁdef".encode("cp1252", "ignore"))
> abcdef
>>>> print(u"abcŁdef".encode("cp1252", "backslashreplace"))
> abc\u0141def

That would certainly be a better approach, provided that our
cp-encodings are indeed compatible with the Windows variants
(which unfortunately tend to often use slightly different
mappings).

We could then also alias 'mbcs' to the cp-encoding (sort of
like the reverse of what we do in site.py:aliasmbcs().

----------
nosy: +lemburg
title: Support PEP 383 on Windows: mbcs support of surrogateescape error 
handler -> Support PEP 383 on Windows: mbcs support of surrogateescape error 
handler

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue9821>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to