Steven Bethard wrote:
Yeah, I agree it's weird. I suspect if someone supplied a patch for this behavior it would be accepted -- I don't think this should break backwards compatibility (much).
Notice that the "right" thing to do would be to pass encoding and errors to __unicode__. If the string object needs to be told what encoding it is in, why not any other other object as well?
Unfortunately, this apparently was overlooked, and now it is too late to change it (or else the existing __unicode__ methods would all break if they suddenly get an encoding argument).
Could this be handled with a try / except in unicode()? Something like this: >>> class A: ... def u(self): # __unicode__ with no args ... print 'A.u()' ... >>> class B: ... def u(self, enc, err): # __unicode__ with two args ... print 'B.u()', enc, err ... >>> def convert(obj, enc='ascii', err='strict'): # unicode() function delegates to u() ... try: ... obj.u(enc, err) ... except TypeError: ... obj.u() ... >>> convert(a) A.u() >>> convert(a, 'utf-8', 'replace') A.u() >>> convert(b) B.u() ascii strict >>> convert(b, 'utf-8', 'replace') B.u() utf-8 replace
As for using encoding and errors on the result of str() conversion
of the object: how can the caller know what encoding the result of
str() is in, reasonably?
The same way that the caller will know the encoding of a byte string, or of the result of str(some_object) - in my experience, usually by careful detective work on the source of the string or object followed by attempts to better understand and control the encoding used throughout the application.
It seems more correct to assume that the
str() result in in the system default encoding.
To assume that in absence of any guidance, sure, that is consistent. But to ignore the guidance the programmer attempts to provide?
One thing that hasn't been pointed out in this thread yet is that the OP could just define __unicode__() on his class to do what he wants...
Kent -- http://mail.python.org/mailman/listinfo/python-list