New submission from Ma Lin: hz is a Simplified Chinese codec, available in Python since around 2004.
However, hz encoder has a serious bug, it forgets to escape ~ >>> 'hi~'.encode('hz') b'hi~' # the correct output should be b'hi~~' As a result, we can't finish a roundtrip: >>> b'hi~'.decode('hz') Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'hz' codec can't decode byte 0x7e in position 2: incomplete multibyte In these years, no one has reported this bug, so I think it's pretty safe to remove hz codec. FYI: HZ codec is a 7-bit wrapper for GB2312, was formerly commonly used in email and USENET postings. It was designed in 1989 by Fung Fung Lee, and subsequently codified in 1995 into RFC 1843. It was popular in USENET networks, which in the late 1980s and early 1990s, generally did not allow transmission of 8-bit characters or escape characters. https://en.wikipedia.org/wiki/HZ_(character_encoding) Does other languages have hz codec? Java 8: no [1] .NET: yes [2] PHP: yes [3] Perl: yes [4] [1] http://docs.oracle.com/javase/8/docs/technotes/guides/intl/encoding.doc.html [2] https://msdn.microsoft.com/en-us/library/system.text.encoding(v=vs.110).aspx [3] http://php.net/manual/en/mbstring.supported-encodings.php [4] http://perldoc.perl.org/Encode/CN.html ---------- components: Unicode messages: 291207 nosy: Ma Lin, ezio.melotti, haypo, xiang.zhang priority: normal severity: normal status: open title: Remove hz codec type: behavior versions: Python 3.7 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue30003> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com