> 2. If this returns "C" or anything without 'utf8' in it, then things start > to go downhill: > 2a. The app assumes unicode objects internally. i.e. Whenever there is > a "string like this" in a var it's supposed to be unicode. Whenever > something comes into the app (from a filename, a file's contents, the > command-line) it's assumed to be a byte-string that I decode("utf8") on > before placing it into my objects etc.
That's a bug in the app. It shouldn't assume that environment variables are UTF-8. Instead, it should assume that they are in the locale's encoding, and compute that encoding with locale.getpreferredencoding. > 2b. Because of 2a and if the locale is not 'utf8 aware' (i.e. "C") I start > getting all the old 'ascii' unicode decode errors. This happens at every > string operation, at every print command and is almost impossible to fix. If you print non-ASCII strings to the terminal, and you can't be certain that the terminal supports the encoding in the string, and you can't reasonably deal with the exceptions, you should accept moji-bake, by specifying the "replace" error handler when converting strings to the terminal's encoding. > 3. I made the decision to check the locale and stop the app if the return > from getlocale is (None,None). I would avoid locale.getlocale. It's a pointless function (IMO). Also, what's the purpose of this test? > Does anyone have some ideas? Is there a universal "proper" locale that we > could set a system to *before* the Debian build stuff starts? What would > that be - en_US.utf8? Your program definitely, absolutely must work in the C locale. Of course, you cannot have any non-ASCII characters in that locale, so deal with it. If you have solved that, chances are high that it will work in other locales as well (but be sure to try Turkish, as that gives a surprising meaning to "I".lower()). Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list