New submission from Nick Coghlan:

Passing the wrong types to codecs can currently lead to rather confusing 
exceptions, like:

====================
>>> b"ZXhhbXBsZQ==\n".decode("base64_codec")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python3.2/encodings/base64_codec.py", line 20, in 
base64_decode
    return (base64.decodebytes(input), len(input))
  File "/usr/lib64/python3.2/base64.py", line 359, in decodebytes
    raise TypeError("expected bytes, not %s" % s.__class__.__name__)
TypeError: expected bytes, not memoryview
====================
>>> codecs.decode("example", "utf8")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python3.2/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
TypeError: 'str' does not support the buffer interface
====================

This situation could be improved by having the affected APIs use the exception 
chaining system to wrap these errors in a more informative exception that also 
display information on the codec involved. Note that UnicodeEncodeError and 
UnicodeDecodeError are not appropriate, as those are specific to text encoding 
operations, while these new wrappers will apply to arbitrary codecs, regardless 
of whether or not they use the unicode error handlers. Furthermore, for 
backwards compatibility with existing exception handling, it is probably 
necessary to limit ourselves to specific exception types and ensure that the 
wrapper exceptions are subclasses of those types.

These new wrappers would have __cause__ set to the exception raised by the 
codec, but emit a message more along the lines of the following:

==============
codecs.DecodeTypeError: encoding='utf8', details="TypeError: 'str' does not 
support the buffer interface"
==============

Wrapping TypeError and ValueError should cover most cases, which would mean 
four new exception types in the codecs module:

Raised by codecs.decode, bytes.decode and bytearray.decode:
* codecs.DecodeTypeError
* codecs.DecodeValueError

Raised by codecs.encode, str.encode:
* codecs.EncodeTypeError
* codecs.EncodeValueError

Instances of UnicodeError wouldn't be wrapped, since they already contain codec 
information.

----------
components: Library (Lib)
messages: 187704
nosy: ncoghlan
priority: normal
severity: normal
status: open
title: More informative error handling when encoding and decoding
versions: Python 3.4

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue17828>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to