New submission from John Machin <sjmac...@users.sourceforge.net>: Unicode 5.2.0 chapter 3 (Conformance) has a new section (headed "Constraints on Conversion Processes) after requirement D93. Recent Pythons e.g. 3.1.2 don't comply. Using the Unicode example:
>>> print(ascii(b"\xc2\x41\x42".decode('utf8', 'replace'))) '\ufffdB' # should produce u'\ufffdAB' Resynchronisation currently starts at a position derived by considering the length implied by the start byte: >>> print(ascii(b"\xf1ABCD".decode('utf8', 'replace'))) '\ufffdD' # should produce u'\ufffdABCD'; resync should start from the *failing* byte. Notes: This applies to the 'ignore' option as well as the 'replace' option. The Unicode discussion mentions "security exploits". ---------- messages: 101972 nosy: sjmachin severity: normal status: open title: str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0 type: behavior versions: Python 2.7, Python 3.1 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue8271> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com