Re: how to write a unicode string to a file ?

Stef Mientki Fri, 16 Oct 2009 17:16:41 -0700

Stephen Hansen wrote:

On Thu, Oct 15, 2009 at 4:43 PM, Stef Mientki <stef.mien...@gmail.com<mailto:stef.mien...@gmail.com>> wrote:
    hello,

    By writing the following unicode string (I hope it can be send on
    this mailing list)

      Bücken

    to a file

       fh.write ( line )

    I get the following error:

     UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc'
    in position 9: ordinal not in range(128)

    How should I write such a string to a file ?
First, you have to understand that a file never really containsunicode-- not in the way that it exists in memory / in python when youtype line = u'Bücken'. It contains a series of bytes that are anencoded form of that abstract unicode data.
There's various encodings you can use-- UTF-8 and UTF-16 are in myexperience the most common. UTF-8 is an ASCII-superset, and its theone I see most often.
So, you can do:

  import codecs
  f = codecs.open('filepath', 'w', 'utf-8')
  f.write(line)
To read such a file, you'd do codecs.open as well, just with a 'r'mode and not a 'w' mode.

Thanks guys,
I didn't know the codecs module,
and the codecs seems to be a good solution,
at least it can safely write a file.
But now I have to open that file in Excel 2000 ... 2007,
and I get something completely wrong.
After changing codecs to latin-1 or windows-1252,
everything works fine.

Which of the 2 should I use latin-1 or windows-1252 ?

And a more general question, how should I organize my Python programs ?
In general I've data coming from Excel, Delphi, SQLite.
In Python I always use wxPython, so I'm forced to use unicode.
My output often needs to be exported to Excel, SPSS, SQLite.
So would this be a good design ?

Excel    |      convert        wxPython      convert        Excel
Delphi   |===>    to      ===>   in     ===>   to      ===> SQLite
SQLite   |      unicode        unicode       latin-1        SPSS

thanks,
Stef Mientki

Now, that uses a file object created with the "codecs" module whichoperates with theoretical unicode streams. It will automatically takeany passed in unicode strings, encode them in the specified encoding(utf8), and write the resulting bytes out.
You can also do that manually with a regular file object, via:

  f.write(line.encode("utf8"))
If you are reading such a file later with a normal file object (e.g.,not one created with codecs.open), you would do:
  f = open('filepath', 'rb')
  byte_data = f.read()
  uni_data = byte_data.decode("utf8")
That will convert the byte-encoded data back to real unicode strings.Be sure to do this even if it doesn't seem you need to if the filecontains encoded unicode data (a thing you can only know based ondocumentation of whatever produced that file)... for example, a UTF8encoded file might look and work like a completely normal ASCII file,but if its really UTF8... eventually your code will break that onetime someone puts in a non-ascii character. Since UTF8 is an ASCIIsuperset, its indistinguishable from ASCII until it contains anon-ASCII character.


HTH,

--S


--
http://mail.python.org/mailman/listinfo/python-list

Re: how to write a unicode string to a file ?

Reply via email to