On Sun, 09 Jun 2013 02:38:13 -0700, Νικόλαος Κούρας wrote: > s = 'α' > s = s.encode('iso-8859-7').decode('utf-8') > > UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe1 in position 0: > unexpected end of data > > Why this error? because 'a' ordinal value > 127 ?
Look at it this way... consider encoding and decoding to be like translating from one language to another. Suppose you start with the English word "street". You encode it to German by looking it up in an English-To-German dictionary: street -> Straße The you decode the German by looking "Straße" up in a German-To-English dictionary: Straße -> street and everything is good. But suppose that after encoding the English to German, you get confused, and think that it is Italian, not German. So when it comes to decoding, you try to look up 'Staße' in an Italian-To- English dictionary, and discover that there is no such thing as letter ß in Italian. So you cannot look the word up, and you get frustrated and shout "this is rubbish, there's no such thing as ß, that's not a letter!" Not in Italian, but it is a perfectly good letter in German. But you're looking it up in the wrong dictionary. Same thing with UTF-8. You encoded the string 'α' by looking it up in the "Unicode To ISO-8859-7 bytes" dictionary. Then you try to decode it by looking for those bytes in the "UTF-8 bytes To Unicode" dictionary. But you can't find byte 0xe1 on its own in UTF-8 bytes, so Python shouts "this is rubbish, there's no such thing as 0xe1 on its own in UTF-8!" and raises UnicodeDecodeError. Sometimes you don't get an exception. Suppose that you are encoding from French to German: qui -> die (both words mean "who" in English) Now if you get confused, and decode the word 'die' by looking it up in an English-To-French dictionary, instead of German-To-French, you get: die -> mourir So instead of getting 'qui' back again, you get 'mourir'. This is like mojibake: the results are garbage, but there is no exception raised to warn you. -- Steven -- http://mail.python.org/mailman/listinfo/python-list