Stephen Hansen wrote:
On Thu, Oct 15, 2009 at 4:43 PM, Stef Mientki <stef.mien...@gmail.com
<mailto:stef.mien...@gmail.com>> wrote:
hello,
By writing the following unicode string (I hope it can be send on
this mailing list)
Bücken
to a file
fh.write ( line )
I get the following error:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc'
in position 9: ordinal not in range(128)
How should I write such a string to a file ?
First, you have to understand that a file never really contains
unicode-- not in the way that it exists in memory / in python when you
type line = u'Bücken'. It contains a series of bytes that are an
encoded form of that abstract unicode data.
There's various encodings you can use-- UTF-8 and UTF-16 are in my
experience the most common. UTF-8 is an ASCII-superset, and its the
one I see most often.
So, you can do:
import codecs
f = codecs.open('filepath', 'w', 'utf-8')
f.write(line)
To read such a file, you'd do codecs.open as well, just with a 'r'
mode and not a 'w' mode.
Thanks guys,
I didn't know the codecs module,
and the codecs seems to be a good solution,
at least it can safely write a file.
But now I have to open that file in Excel 2000 ... 2007,
and I get something completely wrong.
After changing codecs to latin-1 or windows-1252,
everything works fine.
Which of the 2 should I use latin-1 or windows-1252 ?
And a more general question, how should I organize my Python programs ?
In general I've data coming from Excel, Delphi, SQLite.
In Python I always use wxPython, so I'm forced to use unicode.
My output often needs to be exported to Excel, SPSS, SQLite.
So would this be a good design ?
Excel | convert wxPython convert Excel
Delphi |===> to ===> in ===> to ===> SQLite
SQLite | unicode unicode latin-1 SPSS
thanks,
Stef Mientki
Now, that uses a file object created with the "codecs" module which
operates with theoretical unicode streams. It will automatically take
any passed in unicode strings, encode them in the specified encoding
(utf8), and write the resulting bytes out.
You can also do that manually with a regular file object, via:
f.write(line.encode("utf8"))
If you are reading such a file later with a normal file object (e.g.,
not one created with codecs.open), you would do:
f = open('filepath', 'rb')
byte_data = f.read()
uni_data = byte_data.decode("utf8")
That will convert the byte-encoded data back to real unicode strings.
Be sure to do this even if it doesn't seem you need to if the file
contains encoded unicode data (a thing you can only know based on
documentation of whatever produced that file)... for example, a UTF8
encoded file might look and work like a completely normal ASCII file,
but if its really UTF8... eventually your code will break that one
time someone puts in a non-ascii character. Since UTF8 is an ASCII
superset, its indistinguishable from ASCII until it contains a
non-ASCII character.
HTH,
--S
--
http://mail.python.org/mailman/listinfo/python-list