On Dec 27, 6:47 am, "Mark Tolonen" <metolone+gm...@gmail.com> wrote: > "gintare" <g.statk...@gmail.com> wrote in message > > In file i find 'hyv\xe4' instead of hyv . > > When you open a file with codecs.open(), it expects Unicode strings to be > written to the file. Don't encode them again. Also, .writelines() expects > a list of strings. Use .write(): > > import codecs > item=u'hyv\xe4' > F=codecs.open('/opt/finnish.txt', 'w+', 'utf8') > F.write(item) > F.close()
Gintare, Mark's code is correct. When you are reading the file back make sure you understand what you are seeing: >>> F2 = codecs.open('finnish.txt', 'r', 'utf8') >>> item2 = F2.read() >>> item2 u'hyv\xe4' That might like as though item2 is 7 characters long, and it contains a backslash followed by x, e, 4. However item2 is identical to item, they both contain 4 characters - the final one being a-umlaut. Python has shown the string using a backslash escape, because printing a non- ascii character might fail. You can see this directly, if your Python session is running in a terminal (or GUI) that can handle non-ascii characters: >>> print item2 hyvä -- http://mail.python.org/mailman/listinfo/python-list