Nick Coghlan <ncogh...@gmail.com> added the comment:

Some further comments after getting back up to speed with the actual status of 
this problem (i.e. that we had issues with the error checking and reporting in 
the original 3.2 commit).

1. I agree with the position that the codecs module itself is intended to be a 
type neutral codec registry. It encodes and decodes things, but shouldn't 
actually care about the types involved. If that is currently not the case in 
3.x, it needs to be fixed.

This type neutrality was blurred in 2.x by the fact that it only implemented 
str->str translations, and even further obscured by the coupling to the 
.encode() and .decode() convenience APIs. The fact that the type neutrality of 
the registry itself is currently broken in 3.x is a *regression*, not an 
improvement. (The convenience APIs, on the other hand, are definitely *not* 
type neutral, and aren't intended to be)

2. To assist in producing nice error messages, and to allow restrictions to be 
enforced on type-specific convenience APIs, the CodecInfo objects should grow 
additional state as MAL suggests. To avoid redundancy (and inaccurate 
overspecification), my suggested colour for that particular bikeshed is:

Character encoding codec:
  .decoded_format = 'text'
  .encoded_format = 'binary'

Binary transform codec:
  .decoded_format = 'binary'
  .encoded_format = 'binary'

Text transform codec:
  .decoded_format = 'text'
  .encoded_format = 'text'

I suggest using the fuzzy format labels mainly due to the existence of the 
buffer API - most codec operations that consume binary data will accept 
anything that implements the buffer API, so referring specifically to 'bytes' 
in error messages would be inaccurate.

The convenience APIs can then emit errors like:

  'a'.encode('rot_13') ==>
  CodecLookupError: text <-> binary codec expected ('rot_13' is text <-> text)

  'a'.decode('rot_13') ==>
  CodecLookupError: text <-> binary codec expected ('rot_13' is text <-> text)

  'a'.transform('bz2') ==>
  CodecLookupError: text <-> text codec expected ('bz2' is binary <-> binary)

  'a'.transform('ascii') ==>
  CodecLookupError: text <-> text codec expected ('ascii' is text <-> binary)

  b'a'.transform('ascii') ==>
  CodecLookupError: binary <-> binary codec expected ('ascii' is text <-> 
binary)

For backwards compatibility with 3.2, codecs that do not specify their formats 
should be treated as character encoding codecs (i.e. decoded format is 'text', 
encoded format is 'binary')

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue7475>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to