Martin, I really appreciate your reply. I have been working in a vacuum on this and without any experience. I hope you don't mind if I ask you a bunch of questions. If I can get over some conceptual 'humps' then I'm sure I can produce a better app.
> That's a bug in the app. It shouldn't assume that environment variables > are UTF-8. Instead, it should assume that they are in the locale's > encoding, and compute that encoding with locale.getpreferredencoding. I see what you are saying and agree, and I am confused about files and filenames. My app has to handle font files which can come from anywhere. If the locale (locale.getpreferredencoding) returns something like "ANSI" and I am doing an os.listdir() then I lose the plot a little... It seems to me that filenames are like snapshots of the locales where they originated. If there's a font file from India and I want to open it on my system in South Africa (and I have LANG=C) then it seems that it's impossible to do. If I access the filename it throws a unicodeDecodeError. If I use 'replace' or 'ignore' then I am mangling the filename and I won't be able to open it. The same goes for adding 'foreign' filenames to paths with any kind of string operation. My (admittedly uninformed) conception is that by forcing the app to always use utf8 I can access any filename in any encoding. The problem seems to be that I cannot know *what* encoding (and I get encode/decode mixed up still, very new to it all) that particular filename is in. Am I right? Wrong? Deluded? :) Please fill me in. > If you print non-ASCII strings to the terminal, and you can't be certain > that the terminal supports the encoding in the string, and you can't > reasonably deal with the exceptions, you should accept moji-bake, by > specifying the "replace" error handler when converting strings to the > terminal's encoding. I went through this exercise recently and had no joy. It seems the string I chose to use simply would not render - even under 'ignore' and 'replace'. It's really frustrating because I don't speak a non-ascii language and so can't know if I am testing real-world strings or crazy Tolkein strings. Another aspect of this is wxPython. My app requires the unicode build so that strings have some hope of displaying on the widgets. If I access a font file and fetch the family name - that can be encoded in any way, again unknown, and I want to fetch it as 'unicode' and pass it to the widgets and not worry about what's really going on. Given that, I thought I'd extend the 'utf8' only concept to the app in general. I am sure I am wrong, but I feel cornered at the moment. > > 3. I made the decision to check the locale and stop the app if the return > > from getlocale is (None,None). > I would avoid locale.getlocale. It's a pointless function (IMO). Could you say why? Here's my use of it: locale.setlocale( locale.LC_ALL, "" ) loc = locale.getlocale()[0] if loc == None: loc = locale.getlocale() if loc == (None, None): print localeHelp # not utf-8 (I think) raise SystemExit # Now gettext domain = "all" gettext.install( domain, localedir, unicode = True ) lang = gettext.translation(domain, localedir, languages = [loc] ) lang.install(unicode = True ) So, I am using getlocale to get a tuple/list (easy, no?) to pass to the gettext.install function. > Your program definitely, absolutely must work in the C locale. Of > course, you cannot have any non-ASCII characters in that locale, so > deal with it. This would mean cutting-out a percentage of the external font files that can be used by the app. Is there no modern standard regarding the LANG variable and locales these days? My locale -a reports a bunch of xx_XX.utf8 locales. Does it even make sense to use a non-utf8 locale anymore? > If you have solved that, chances are high that it will work in other > locales as well (but be sure to try Turkish, as that gives a > surprising meaning to "I".lower()). Oh boy, this gives me cold chills. I don't have the resources to start worrying about every single language's edge-cases. This is kind of why I was leaning towards a "use a utf8 locale please" approach. \d -- Fonty Python and other dev news at: http://otherwiseingle.blogspot.com/ -- http://mail.python.org/mailman/listinfo/python-list