Frank Stajano wrote: > A simple unicode question. How do I print? > > Sample code: > > # -*- coding: utf-8 -*- > s1 = u"héllô wórld" > print s1 > # Gives UnicodeEncodeError: 'ascii' codec can't encode character > # u'\xe9' in position 1: ordinal not in range(128) > > > What I actually want to do is slightly more elaborate: read from a text > file which is in utf-8, do some manipulations of the text and print the > result on stdout. I understand I must open the file with > > f = codecs.open("input.txt", "r", "utf-8") > > but then I get stuck as above. > > I tried > > s2 = s1.encode("utf-8") > print s2 > > but got > > héllô wórld
Which is perfectly alright - it's just that your terminal isn't prepared to decode UTF-8, but some other encoding, like latin1. > Then, in the hope of being able to write the string to a file if not to > stdout, I also tried > > > import codecs > f = codecs.open("out.txt", "w", "utf-8") > f.write(s2) > > but got > > UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: > ordinal not in range(128) Instead of writing s2 (which is a byte-string!!!), write s1. It will work. The error you get stems from f.write wanting a unicode-object, but s2 is a bytestring (you explicitly converted it before), so python tries to encode the bytestring with the default encoding - ascii - to a unicode string. This of course fails. Diez -- http://mail.python.org/mailman/listinfo/python-list