On Tue, 04 Jan 2005 16:41:05 +0100, Thomas Heller <[EMAIL PROTECTED]> wrote: >Skip Montanaro <[EMAIL PROTECTED]> writes: > > > michele> BTW what's the difference between .encode and .decode ? > > > > I started to answer, then got confused when I read the docstrings for > > unicode.encode and unicode.decode: > > > > [snip - docstrings] > > > > It probably makes sense to one who knows, but for the feeble-minded like > > myself, they seem about the same. > > It seems also the error messages aren't too helpful: > > >>> "ä".encode("latin-1") > Traceback (most recent call last): > File "<stdin>", line 1, in ? > UnicodeDecodeError: 'ascii' codec can't decode byte 0x84 in position 0: > ordinal not in range(128) > >>> > > Hm, why does the 'encode' call complain about decoding? > > Why do string objects have an encode method, and why do unicode objects > have a decode method, and what does this error message want to tell me: > > >>> u"ä".decode("latin-1") > Traceback (most recent call last): > File "<stdin>", line 1, in ? > UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position > 0: ordinal not in range(128) > >>>
The call unicode.decode(codec) is actually doing this unicode.encode(sys.getdefaultencoding()).decode(codec) This is not a particularly nice thing. I'm not sure who thought it was a good idea. One possibility is that .encode() and .decode() are not _only_ for converting between unicode and encoded bytestrings. For example, there is the zlib codec, the rot13 codec, and applications can define their own codecs with arbitrary behavior. It's entirely possible to write a codec that decodes _from_ unicode objects _to_ unicode objects and encodes the same way. So unicode objects need both methods to support this use case. Jp -- http://mail.python.org/mailman/listinfo/python-list