On 10/8/2010 9:45 AM, Hallvard B Furuseth wrote:

Actually, the implicit contract of __str__ is that it never fails, so
that everything can be printed out (for debugging purposes, etc.).

Nope:

$ python2 -c 'str(u"\u1000")'
Traceback (most recent call last):
   File "<string>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\u1000' in position 
0: ordinal not in range(128)

This could be considered a design bug due to 'str' being used both to produce readable string representations of objects (perhaps one that could be eval'ed) and to convert unicode objects to equivalent string objects. which is not the same operation!

The above really should have produced '\u1000'! (the equivavlent of what str(bytes) does today). The 'conversion to equivalent str object' option should have required an explicit encoding arg rather than defaulting to the ascii codec. This mistake has been corrected in 3.x, so Yep.

And the equivalent:

$ python2 -c 'unicode("\xA0")'
Traceback (most recent call last):
   File "<string>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 0: ordinal 
not in range(128)

This is an application bug: either bad string or missing decoding arg.

In Python 2, these two UnicodeEncodeErrors made our data safe from code
which used str and unicode objects without checking too carefully which
was which.  Code which sort the types out carefully enough would fail.

In Python 3, that safety only exists for bytes(str), not str(bytes).

If you prefer the buggy 2.x design (and there are *many* tracker bug reports that were fixed by the 3.x change), stick with it.

--
Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to