Ksenia Marasanova wrote: > Hi, > > I have a little problem with encoding. Was hoping maybe anyone can > help me to solve it. > > There is some amount of data in a database (PG) that must be inserted > into Excel sheet and emailed. Nothing special, everything works. > Except that non-ascii characters are not displayed properly. > The data is stored as XML into a text field.
This sentence doesn't make much sense. Explain. > When I use pgsql it's > displayed good in the terminal. Now I run my script and print data > with "print" statement, still goed. Instead of "print data", do "print repr(data)" and show us what you get. What *you* see on the screen is not much use for diagnosis; it's the values of the bytes in the file that matter. > Then I use pyXLWriter to write the > sheet, Open the spreadsheet with Microsoft Excel, copy-and-paste some data to a Notepad window, save the Notepad file as Unicode type named (say) "junk.u16" then at the Python interactive prompt do this: file("junk.u16", "rb").read().decode("utf16") and show us what you get. > and Python email package to email it... and the resulting sheet > is not good: E-mailed how? To whom? [I.e. what country / what cultural background / on what machine / what operating system / viewed using what software] > > Рis displayed instead of à (for example) You are saying (in effect) U+0413 (Cyrillic upper case letter GHE) is displayed instead of U+00FC (Latin small letter U with diaeresis). OK, we'd already guessed your background from your name :-) However, what you see isn't necessarily what you've got. How do you know it's not U+0393 (Greek capital letter GAMMA) or something else that looks the same? Could even be from a line-drawing set (top left corner of a box). What you need to do is find out the ordinal of the character being displayed. This type of problem arises when a character is written in one encoding and viewed using another. I've had a quick look through various likely-suspect 8-bit character sets (e.g. Latin1, KOI-8, cp1251, cp1252, various DOS (OEM) code-pages) and I couldn't see a pair of encodings that would reproduce anything like your "umlauted-u becomes gamma-or-similar" problem. Please supply more than 1 example. -- http://mail.python.org/mailman/listinfo/python-list