At least two standard error handlers are documented as working for encoding only:
xmlcharrefreplace backslashreplace See http://docs.python.org/library/codecs.html#codec-base-classes and http://docs.python.org/py3k/library/codecs.html Why is this? I don't see why they shouldn't work for decoding as well. Consider this example using Python 3.2: >>> b"aaa--\xe9z--\xe9!--bbb".decode("cp932") Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'cp932' codec can't decode bytes in position 9-10: illegal multibyte sequence The two bytes b'\xe9!' is an illegal multibyte sequence for CP-932 (also known as MS-KANJI or SHIFT-JIS). Is there some reason why this shouldn't or can't be supported? # This doesn't actually work. b"aaa--\xe9z--\xe9!--bbb".decode("cp932", "backslashreplace") => r'aaa--騷--\xe9\x21--bbb' and similarly for xmlcharrefreplace. -- Steven -- http://mail.python.org/mailman/listinfo/python-list