On Tue, Jul 31, 2007 at 09:53:11AM -0700, 7stud wrote: > s1 = "hello" > s2 = s1.encode("utf-8") > > s1 = "an accented 'e': \xc3\xa9" > s2 = s1.encode("utf-8") > > The last line produces the error: > > --- > Traceback (most recent call last): > File "test1.py", line 6, in ? > s2 = s1.encode("utf-8") > UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position > 17: ordinal not in range(128) > --- > > The error is a "decode" error, and as far as I can tell, decoding > happens when you convert a regular string to a unicode string. So, is > there an implicit conversion taking place from s1 to a unicode string > before encode() is called? By what mechanism?
Yep. You are trying to encode a string. The problem is that strings are already encoded, so it generally makes no sense to call .encode() on them. .encode()ing a string can be handy if you want to convert its encoding. In such a case, though, Python will first convert the string to Unicode. To do that, it has to know how the string is encoded. Unless you tell it otherwise, Python assumes the string is encoded in ascii. You had a byte in there that was out of ascii's range...thus, the error. Python was trying to decode the string, assumed it was ascii, but that didn't work. This is all very confusing; I'd highly recommend reading this bit about Unicode. It started me down the difficult road of actually understanding what is going on here. http://www.joelonsoftware.com/articles/Unicode.html -- It's another Baseline Boulder Morning. -- http://mail.python.org/mailman/listinfo/python-list