[issue8308] raw_bytes.decode('cp932') -- spurious mappings

John Machin Sat, 03 Apr 2010 16:40:25 -0700

New submission from John Machin <sjmac...@users.sourceforge.net>:

According to the following references, the bytes 80, A0, FD, FE, and FF are not 
defined in cp932:


http://msdn.microsoft.com/en-au/goglobal/cc305152.aspx
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT
http://demo.icu-project.org/icu-bin/convexp?conv=ibm-943_P15A-2003&s=ALL

However CPython 3.1.2 does this:

 >>> print(ascii(b'\x80\xa0\xfd\xfe\xff'.decode('cp932')))
 '\x80\uf8f0\uf8f1\uf8f2\uf8f3'

(as do 2.5, 2.6. and 2.7 with the appropriate syntax)

This maps 80 to U+0080 (not very useful) and maps the other 4 bytes into the 
Private Use Area ("PUA")!! Each case should be treated as 
undefined/unexpected/error/...

----------
components: Unicode
messages: 102308
nosy: sjmachin
severity: normal
status: open
title: raw_bytes.decode('cp932') -- spurious mappings
type: behavior
versions: Python 2.7, Python 3.1

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue8308>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue8308] raw_bytes.decode('cp932') -- spurious mappings

Reply via email to