On Wed, Jun 1, 2011 at 5:52 AM, Wolfgang Meiners <wolfgangmeiner...@web.de> wrote: > Whenever i 'cross the border' of my program, i have to encode the 'list > of bytes' to an unicode string or decode the unicode string to a 'list > of bytes' which is meaningful to the world outside.
Most people use "encode" and "decode" the other way around; you encode a string as UTF-8, and decode UTF-8 into a Unicode string. But yes, you're correct. > So encode early, decode lately means, to do it as near to the border as > possible and to encode/decode i need a coding system, for example 'utf8' Correct on both counts. > That means, there should be an encoding/decoding possibility to every > interface i can use: files, stdin, stdout, stderr, gui (should be the > most important ones). The file objects (as returned by open()) have an encoding, which (IMHO) defaults to "utf8". GUI work depends on your GUI toolkit, and might well accept Unicode strings directly - check the docs. > def __repr__(self): > return u'My name is %s' % self.Name This means that repr() will return a Unicode string. > # this does work > print a.__repr__() > > # throws an error if default encoding is ascii > # but works if default encoding is utf8 > print a > > # throws an error because a is not a string > print unicode(a, encoding='utf8') The __repr__ function is supposed to return a string object, in Python 2. See http://docs.python.org/reference/datamodel.html#object.__repr__ for that and other advice on writing __repr__. The problems you're seeing are a result of the built-in repr() function calling a.__repr__() and then treating the return value as an ASCII str, not a Unicode string. This would work: def __repr__(self): return (u'My name is %s' % self.Name).encode('utf8') Alternatively, migrate to Python 3, where the default is Unicode strings. I tested this in Python 3.2 on Windows, but it should work on anything in the 3.x branch: class NoEnc: def __init__(self,Name=None): self.Name=Name def __repr__(self): return 'My name is %s' % self.Name if __name__ == '__main__': a = NoEnc('Müller') # this will still work (print is now a function, not a statement) print(a.__repr__()) # this will work in Python 3.x print(a) # 'unicode' has been renamed to 'str', but it's already unicode so this makes no sense print(str(a, encoding='utf8')) # to convert it to UTF-8, convert it to a string with str() or repr() and then print: print(str(a).encode('utf8')) ############################ Note that the last one will probably not do what you expect. The Python 3 'print' function (it's not a statement any more, so you need parentheses around its argument) wants a Unicode string, so you don't need to encode it. When you encode a Unicode string as in the last example, it returns a bytes string (an array of bytes), which looks like this: b'My name is M\xc3\xbcller' The print function wants Unicode, though, so it takes this unexpected object and calls str() on it, hence the odd display. Hope that helps! Chris Angelico -- http://mail.python.org/mailman/listinfo/python-list