New submission from Ma Lin:

hz is a Simplified Chinese codec, available in Python since around 2004.

However, hz encoder has a serious bug, it forgets to escape ~
>>> 'hi~'.encode('hz')
b'hi~'    # the correct output should be b'hi~~'

As a result, we can't finish a roundtrip:
>>> b'hi~'.decode('hz')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'hz' codec can't decode byte 0x7e in position 2: incomplete 
multibyte

In these years, no one has reported this bug, so I think it's pretty safe to 
remove hz codec.

FYI:
HZ codec is a 7-bit wrapper for GB2312, was formerly commonly used in email and 
USENET postings. It was designed in 1989 by Fung Fung Lee, and subsequently 
codified in 1995 into RFC 1843.

It was popular in USENET networks, which in the late 1980s and early 1990s, 
generally did not allow transmission of 8-bit characters or escape characters.

https://en.wikipedia.org/wiki/HZ_(character_encoding)

Does other languages have hz codec?
Java 8: no [1]
.NET: yes [2]
PHP: yes [3]
Perl: yes [4]

[1] http://docs.oracle.com/javase/8/docs/technotes/guides/intl/encoding.doc.html
[2] https://msdn.microsoft.com/en-us/library/system.text.encoding(v=vs.110).aspx
[3] http://php.net/manual/en/mbstring.supported-encodings.php
[4] http://perldoc.perl.org/Encode/CN.html

----------
components: Unicode
messages: 291207
nosy: Ma Lin, ezio.melotti, haypo, xiang.zhang
priority: normal
severity: normal
status: open
title: Remove hz codec
type: behavior
versions: Python 3.7

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue30003>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to