Carsten Haese wrote: > If that really is the line that barfs, wouldn't it make more sense to > repr() the unicode object in the second position? > > import sys > for k in sys.stdin: > print '%s -> %s' % (k, repr(k.decode('iso-8859-1'))) > > Also, I'm not sure if the OP has told us the truth about his code and/or > his error message. The implicit str() call done by formatting a unicode > object with %s would raise a UnicodeEncodeError, not the > UnicodeDecodeError that the OP is reporting. So either I need more > coffee or there is something else going on here that hasn't come to > light yet.
When mixing Unicode with byte strings, Python attempts to decode the byte string, not encode the Unicode string. In this case, Python first inserts the non-ASCII byte string in "%s -> %s" and gets a byte string. It then attempts to insert the non-ASCII Unicode string, and realizes that it has to convert the (partially built) target string to Unicode for that to work. Which results in a *UnicodeDecodeError*. >>> "%s -> %s" % ("åäö", "åäö") '\x86\x84\x94 -> \x86\x84\x94' >>> "%s -> %s" % (u"åäö", u"åäö") u'\xe5\xe4\xf6 -> \xe5\xe4\xf6' >>> "%s -> %s" % ("åäö", u"åäö") Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'ascii' codec can't decode byte 0x86 ... (the actual implementation differs a bit from the description above, but the behaviour is identical). </F> -- http://mail.python.org/mailman/listinfo/python-list