Re: string u'hyv\xe4' to file as 'hyvä'

Alex Willmer Mon, 27 Dec 2010 02:03:20 -0800

On Dec 27, 6:47 am, "Mark Tolonen" <metolone+gm...@gmail.com> wrote:
> "gintare" <g.statk...@gmail.com> wrote in message
> > In file i find 'hyv\xe4' instead of hyv .
>
> When you open a file with codecs.open(), it expects Unicode strings to be
> written to the file.  Don't encode them again.  Also, .writelines() expects
> a list of strings.  Use .write():
>
>     import codecs
>     item=u'hyv\xe4'
>     F=codecs.open('/opt/finnish.txt', 'w+', 'utf8')
>     F.write(item)
>     F.close()


Gintare, Mark's code is correct. When you are reading the file back
make sure you understand what you are seeing:

>>> F2 = codecs.open('finnish.txt', 'r', 'utf8')
>>> item2 = F2.read()
>>> item2
u'hyv\xe4'

That might like as though item2 is 7 characters long, and it contains
a backslash followed by x, e, 4. However item2 is identical to item,
they both contain 4 characters - the final one being a-umlaut. Python
has shown the string using a backslash escape, because printing a non-
ascii character might fail. You can see this directly, if your Python
session is running in a terminal (or GUI) that can handle non-ascii
characters:

>>> print item2
hyvä
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: string u'hyv\xe4' to file as 'hyvä'

Reply via email to