Thomas Troeger wrote:
I've stumbled over a problem with Windows Locale ID information and codepages. I'm writing a Python application that parses a CSV file, the format of a line in this file is "LCID;Text1;Text2". Each line can contain a different locale id (LCID) and the text fields contain data that is encoded in some codepage which is associated with this LCID. My current data file contains the codes 1033 for German and 1031 for English US (as listed in http://www.microsoft.com/globaldev/reference/lcid-all.mspx). Unfortunately, I cannot find out which Codepage (like cp-1252 or whatever) belongs to which LCID.

My question is: How can I convert this data into something more reasonable like unicode? Basically, what I want is something like "Text1;Text2", both fields encoded as UTF-8. Can this be done with Python? How can I find out which codepage I have to use for 1033 and 1031?


The GetLocaleInfo API call can do that conversion:

http://msdn.microsoft.com/en-us/library/ms776270(VS.85).aspx

You'll need to use ctypes (or write a c extension) to
use it. Be aware that if it doesn't succeed you may need
to fall back on cp 65001 -- utf8.

TJG
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to