Hello all, I am trying to internationalize my Tkinter program using gettext and encountered various problems, so it looks like it's not a trivial task. After some "research" I made up a few rules for a concept that I hope lets me avoid further encoding trouble, but I would feel more confident if some of the experts here would have a look at the thoughts I made so far and told me if I'm still going wrong somewhere (BTW, the program is supposed to run on linux only). So here is what I have so far:
1. use unicode instead of byte strings wherever possible. This can be a little tricky, because in some situations I cannot know in advance if a certain string is unicode or byte string; I wrote a helper module for this which defines convenience methods for fail-safe decoding/encoding of strings and a Tkinter.UnicodeVar class which I use to convert user input to unicode on the fly (see the code below). 2. so I will have to call gettext.install() with unicode=1 3. make sure to NEVER mix unicode and byte strings within one expression 4. in order to maintain code readability it's better to risk excess decode/encode cycles than having one too few. 5. file operations seem to be delicate; at least I got an error when I passed a filename that contains special characters as unicode to os.access(), so I guess that whenever I do file operations (os.remove(), shutil.copy() ...) the filename should be encoded back into system encoding before; The filename manipulations by the os.path methods seem to be simply string manipulations so encoding the filenames doesn't seem to be necessary. 6. messages that are printed to stdout should be encoded first, too; the same with strings I use to call external shell commands. ############ file UnicodeHandler.py ################################## # -*- coding: iso-8859-1 -*- import Tkinter import sys import locale import codecs def _find_codec(encoding): # return True if the requested codec is available, else return False try: codecs.lookup(encoding) return 1 except LookupError: print 'Warning: codec %s not found' % encoding return 0 def _sysencoding(): # try to guess the system default encoding try: enc = locale.getpreferredencoding().lower() if _find_codec(enc): print 'Setting locale to %s' % enc return enc except AttributeError: # our python is too old, try something else pass enc = locale.getdefaultlocale()[1].lower() if _find_codec(enc): print 'Setting locale to %s' % enc return enc # the last try enc = sys.stdin.encoding.lower() if _find_codec(enc): print 'Setting locale to %s' % enc return enc # aargh, nothing good found, fall back to latin1 and hope for the best print 'Warning: cannot find usable locale, using latin-1' return 'iso-8859-1' sysencoding = _sysencoding() def fsdecode(input, errors='strict'): '''Fail-safe decodes a string into unicode.''' if not isinstance(input, unicode): return unicode(input, sysencoding, errors) return input def fsencode(input, errors='strict'): '''Fail-safe encodes a unicode string into system default encoding.''' if isinstance(input, unicode): return input.encode(sysencoding, errors) return input class UnicodeVar(Tkinter.StringVar): def __init__(self, master=None, errors='strict'): Tkinter.StringVar.__init__(self, master) self.errors = errors self.trace('w', self._str2unicode) def _str2unicode(self, *args): old = self.get() if not isinstance(old, unicode): new = fsdecode(old, self.errors) self.set(new) ####################################################################### So before I start to mess up all of my code, maybe someone can give me a hint if I still forgot something I should keep in mind or if I am completely wrong somewhere. Thanks in advance Michael -- http://mail.python.org/mailman/listinfo/python-list