Am 31.05.11 23:56, schrieb Chris Angelico: > On Wed, Jun 1, 2011 at 5:52 AM, Wolfgang Meiners > <wolfgangmeiner...@web.de> wrote: >> Whenever i 'cross the border' of my program, i have to encode the 'list >> of bytes' to an unicode string or decode the unicode string to a 'list >> of bytes' which is meaningful to the world outside. > > Most people use "encode" and "decode" the other way around; you encode > a string as UTF-8, and decode UTF-8 into a Unicode string. But yes, > you're correct.
Ok. I think i will adapt to the majority in this point. I think i mixed up unicodestring=unicode(bytestring,encoding='utf8') and bytestring=u'unicodestring'.encode('utf8') > >> So encode early, decode lately means, to do it as near to the border as >> possible and to encode/decode i need a coding system, for example 'utf8' > I think i should change this to decode early, encode lately. > Correct on both counts. > >> That means, there should be an encoding/decoding possibility to every >> interface i can use: files, stdin, stdout, stderr, gui (should be the >> most important ones). > > The file objects (as returned by open()) have an encoding, which > (IMHO) defaults to "utf8". GUI work depends on your GUI toolkit, and > might well accept Unicode strings directly - check the docs. > >> def __repr__(self): >> return u'My name is %s' % self.Name > > This means that repr() will return a Unicode string. > >> # this does work >> print a.__repr__() >> >> # throws an error if default encoding is ascii >> # but works if default encoding is utf8 >> print a >> >> # throws an error because a is not a string >> print unicode(a, encoding='utf8') > > The __repr__ function is supposed to return a string object, in Python > 2. See http://docs.python.org/reference/datamodel.html#object.__repr__ > for that and other advice on writing __repr__. The problems you're > seeing are a result of the built-in repr() function calling > a.__repr__() and then treating the return value as an ASCII str, not a > Unicode string. > > This would work: > def __repr__(self): > return (u'My name is %s' % self.Name).encode('utf8') > > Alternatively, migrate to Python 3, where the default is Unicode > strings. I tested this in Python 3.2 on Windows, but it should work on > anything in the 3.x branch: > > class NoEnc: > def __init__(self,Name=None): > self.Name=Name > def __repr__(self): > return 'My name is %s' % self.Name > > if __name__ == '__main__': > > a = NoEnc('Müller') > > # this will still work (print is now a function, not a statement) > print(a.__repr__()) > > # this will work in Python 3.x > print(a) > > # 'unicode' has been renamed to 'str', but it's already unicode so > this makes no sense > print(str(a, encoding='utf8')) > > # to convert it to UTF-8, convert it to a string with str() or > repr() and then print: > print(str(a).encode('utf8')) > ############################ > > Note that the last one will probably not do what you expect. The > Python 3 'print' function (it's not a statement any more, so you need > parentheses around its argument) wants a Unicode string, so you don't > need to encode it. When you encode a Unicode string as in the last > example, it returns a bytes string (an array of bytes), which looks > like this: b'My name is M\xc3\xbcller' The print function wants > Unicode, though, so it takes this unexpected object and calls str() on > it, hence the odd display. > > Hope that helps! Yes it helped a lot. One last question here: When i have free choice and i dont know Python 2 and Python 3 very good: What would be the recommended choice? > > Chris Angelico Wolfgang -- http://mail.python.org/mailman/listinfo/python-list