En Tue, 31 Jul 2007 13:53:11 -0300, 7stud <[EMAIL PROTECTED]> escribió:
> s1 = "hello" > s2 = s1.encode("utf-8") > > s1 = "an accented 'e': \xc3\xa9" > s2 = s1.encode("utf-8") > > The last line produces the error: > > --- > Traceback (most recent call last): > File "test1.py", line 6, in ? > s2 = s1.encode("utf-8") > UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position > 17: ordinal not in range(128) > --- > > The error is a "decode" error, and as far as I can tell, decoding > happens when you convert a regular string to a unicode string. So, is > there an implicit conversion taking place from s1 to a unicode string > before encode() is called? By what mechanism? Converting from unicode characters into a string of bytes is the "encode" operation: unicode.encode() -> str Converting from string of bytes to unicode characters is the "decode" operation: str.decode() -> unicode str.decode and unicode.encode should NOT exist, or at least issue a warning (IMHO). When you try to do str.encode, as the encode operation requires an unicode source, the string is first decoded using the default encoding - and fails. -- Gabriel Genellina -- http://mail.python.org/mailman/listinfo/python-list