Re: unicode(obj, errors='foo') raises TypeError - bug?

Kent Johnson Wed, 23 Feb 2005 14:40:06 -0800

Martin v. LÃwis wrote:

Steven Bethard wrote:
Yeah, I agree it's weird. I suspect if someone supplied a patch for this behavior it would be accepted -- I don't think this should break backwards compatibility (much).
Notice that the "right" thing to do would be to pass encoding and errors
to __unicode__. If the string object needs to be told what encoding it
is in, why not any other other object as well?
Unfortunately, this apparently was overlooked, and now it is too late
to change it (or else the existing __unicode__ methods would all break
if they suddenly get an encoding argument).


Could this be handled with a try / except in unicode()? Something like this:
 >>> class A:
 ...   def u(self):  # __unicode__ with no args
 ...     print 'A.u()'
 ...
 >>> class B:
 ...   def u(self, enc, err):  # __unicode__ with two args
 ...     print 'B.u()', enc, err
 ...
 >>> def convert(obj, enc='ascii', err='strict'): # unicode() function 
delegates to u()
 ...   try:
 ...     obj.u(enc, err)
 ...   except TypeError:
 ...     obj.u()
 ...
 >>> convert(a)
A.u()
 >>> convert(a, 'utf-8', 'replace')
A.u()
 >>> convert(b)
B.u() ascii strict
 >>> convert(b, 'utf-8', 'replace')
B.u() utf-8 replace

As for using encoding and errors on the result of str() conversion of the object: how can the caller know what encoding the result of str() is in, reasonably?

The same way that the caller will know the encoding of a byte string, or of the result of str(some_object) - in my experience, usually by careful detective work on the source of the string or object followed by attempts to better understand and control the encoding used throughout the application.

It seems more correct to assume that the

str() result in in the system default encoding.

To assume that in absence of any guidance, sure, that is consistent. But to ignore the guidance the programmer attempts to provide?

One thing that hasn't been pointed out in this thread yet is that the OP could just define __unicode__() on his class to do what he wants...

Kent
--
http://mail.python.org/mailman/listinfo/python-list

Re: unicode(obj, errors='foo') raises TypeError - bug?

Reply via email to