On Wed, 11 May 2005 14:08:08 -0500, Skip Montanaro <[EMAIL PROTECTED]> wrote:
> > >> Based on the requests I've seen here and on the [EMAIL PROTECTED] > mailing > >> list, it appears people are certainly generating CSV files which > >> contain Unicode- encoded data. > > Fredrik> in what encodings? > >I've seen hints about iso-8859-1/iso-8859-15 and mention that Excel 2000 >supports utf-8. I have Excel 2002 and have done some experimentation. It "supports" utf-8 only to the extent that most times it doesn't mangle the data (i.e. you can save it again without loss); you just can't make any sense out of what's on the screen. Specifically: open a file with CSV extension: Excel assumes blindly that it's encoded according to your locale (e.g. cp1252). open a file with TXT extension: Excel gives you the option of specifying which one of a large number of *legacy* encodings -- yes, that's correct, utf-* are not on the list! NOTE: the above applies even if you have a utf-8-encoded BOM at the start of the file. This behaviour appears to be Excel-specific; MS Word, Wordpad and even the humble Notepad recognise the utf-8-encoded BOM and display sensibly (with a Unicode font, of course). > Whether Excel can dump csv files in utf-8 or not, I don't >know, though I'd suppose so. Unfortunately, your supposition is incorrect. There is no way of specifying the encoding directly. The nearest available options are: (1) csv : encoded in your locale-specific legacy encoding. "illegal" characters are silently replaced by "?" on Windows and (I deduce) underscore on a Macintosh. (2) text : ditto (3) Unicode text: utf-16 -- it *does* subsequently open these correctly i.e. silently detects the encoding and displays properly. -- http://mail.python.org/mailman/listinfo/python-list