On Sun, Jun 9, 2013 at 2:38 AM, Νικόλαος Κούρας <nikos.gr...@gmail.com> wrote: > Τη Κυριακή, 9 Ιουνίου 2013 12:20:58 μ.μ. UTC+3, ο χρήστης Lele Gaifax έγραψε: > >> > How about a string i wonder? >> > s = "νίκος" >> > what_are these_bytes = s.encode('iso-8869-7').encode(utf-8') > >> Ignoring the usual syntax error, this is just a variant of the code I >> posted: "s.encode('iso-8869-7')" produces a bytes instance which >> *cannot* be "re-encoded" again in whatever encoding. > > s = 'a' > s = s.encode('iso-8859-7').decode('utf-8') > print( s ) > > a (we got the original character back) > ================================ > s = 'α' > s = s.encode('iso-8859-7').decode('utf-8') > print( s ) > > UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe1 in position 0: > unexpected end of data > > Why this error? because 'a' ordinal value > 127 ? > --
No. You get that error because the string is not encoded in UTF-8. It's encoded in ISO-8859-7. For ASCII strings (ord(x) < 127), ISO-8859-7 and UTF-8 look exactly the same. For anything else, they are different. If you were to try to decode it as ISO-8859-1, it would succeed, but you would get the character "á" back instead of α. You're misunderstanding the decode function. Decode doesn't turn it into a string with the specified encoding. It takes it *from* the string with the specified encoding and turns it into Python's internal string representation. In Python 3.3, that encoding doesn't even have a name because it's not a standard encoding. So you want the decode argument to match the encode argument. -- http://mail.python.org/mailman/listinfo/python-list