I'm using Python 2.6 on Windows and having trouble with the charset in gettext. It seems to be so broken that I must be missing something.
When I run msgfmt.py, as far as I can see it writes no charset information into the mo file. The actual po files are in utf-8 in this case and have a charset declaration. Then when ,_parse in gettext loads the messages, it does no conversion to Unicode, because it has no charset information. So the message dictionary is actually in utf-8 despite the comment in the code # Note: we unconditionally convert both msgids and msgstrs to # Unicode using the character encoding specified in the charset # parameter of the Content-Type header. Then ugettext tries to just return the translated message, which is not in Unicode, or to convert to Unicode, which fails because the unicode call is not specifying any encoding. The _parse code seems to expect to produce a Unicode translation dictionary, and gettext expects to encode Unicode into the current code page, but the message dictionary never gets mapped to Unicode in the first place. What I want is simply to use utf-8 po files and get translations in Unicode. TIA for any suggestions. -Jon Peck -- http://mail.python.org/mailman/listinfo/python-list