Re: how to write a unicode string to a file ?

Mark Tolonen Sat, 17 Oct 2009 09:49:47 -0700

"Mark Tolonen" <metolone+gm...@gmail.com> wrote in messagenews:hbbo6d$6u...@ger.gmane.org...

"Kee Nethery" <k...@kagi.com> wrote in messagenews:aaab63c6-6e44-4c07-b119-972d4f49e...@kagi.com...
On Oct 16, 2009, at 5:49 PM, Stephen Hansen wrote:
On Fri, Oct 16, 2009 at 5:07 PM, Stef Mientki <stef.mien...@gmail.com>wrote:
snip
The thing is, I'd be VERY surprised (neigh, shocked!) if Excel can'topen a file that is in UTF8-- it just might need to be TOLD that itsutf8 when you go and open the file, as UTF8 looks just like ASCII --until it contains characters that can't be expressed in ASCII. But Idon't know what type of file it is you're saving.
We found that UTF-16 was required for Excel. It would not "do the rightthing" when presented with UTF-8.
Excel seems to expect a UTF-8-encoded BOM (byte order mark) to correctlydecide a file is written in UTF-8. This worked for me:
f=codecs.open('test.csv','wb','utf-8')
f.write(u'\ufeff') # write a BOM
f.write(u'马克,testing,123\r\n')
f.close()
When opened in Excel without the BOM (\ufeff), I got gibberish, but withthe BOM the Chinese characters were displayed correctly.

Also, it turns out the python 'utf-16' encoder adds a BOM for you, which isprobably why UTF-16 worked for you and UTF-8 didn't:

u'\u0102'.encode('utf-16-be') # explicit big-endian, no BOM

'\x01\x02'

u'\u0102'.encode('utf-16-le') # explicit little-endian, no BOM

'\x02\x01'

u'\u0102'.encode('utf-16') # machine native-endian, with BOM

'\xff\xfe\x02\x01'

u'\u0102'.encode('utf-8') # no BOM

'\xc4\x82'

u'\ufeff\u0102'.encode('utf-8') # explicit BOM

'\xef\xbb\xbf\xc4\x82'

-Mark



--
http://mail.python.org/mailman/listinfo/python-list

Re: how to write a unicode string to a file ?

Reply via email to