New submission from Mingye Wang (Arthur2e5):

Microsoft's cp936 defines a euro sign at 0x80, but Python would kick the bucket 
when asked to do something like `u'\u20ac'.encode('cp936')`. This may break 
things for zh-hans-cn windows users who wants to put a euro sign in their file 
name (if they insist on using a non-unicode str for open() in py2, well.)

By looking at the codecs documentation, 'cp936' appears to be an alias for the 
GBK encoder, which by itself has been a very ambiguous name and subject to 
confusion --

The name "GBK" might refer to any of the four commonly-known members of the 
family of EUC-CN (gb2312) extensions that has full coverage of Unicode 1.1 CJK 
Unified Ideographs block:
  1) The original GBK. Rust-Encoding says that it's in a normative annex of 
GB13000.1-1993, but the closest thing I can find in my archive.org copy of that 
standard is an annex on an EUC (GB/T 2311) UCS.
  2) IANA GBK, or Microsoft cp936. This is the one with the euro sign I am 
looking for.
  3) GBK 1.0, a recommendation from the official standardization committees 
based on cp936. It's roughly cp936 without the euro sign but with some 
additional 95 PUA code points. 
  4) W3C TR GBK. This GBK is basically gb18030-2005 without four-byte UTF, and 
with the euro sign. Roughly a union of 2) and 3) with some PUA code points 
moved into the right place.
 
Looking at Modules/cjkcodecs/_codecs_cn.c @ 104259:36b052adf5a7, Python seems 
to be doing either 1) or 3). For a quick fix you can just make an additional 
cp936 encoding around the gbk encoding that handles U+20AC; for some excitement 
(of potentially breaking stuff) you can join the web people and use either 2) 
or 4).

----------
components: Unicode
messages: 277925
nosy: Mingye Wang (Arthur2e5), ezio.melotti, haypo
priority: normal
severity: normal
status: open
title: Bad encoding alias cp936 -> gbk: euro sign
type: behavior
versions: Python 2.7, Python 3.3, Python 3.4, Python 3.5, Python 3.6, Python 3.7

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue28343>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to