Thanks for help! My problem was actualy: >>> a = ["velja\xe8a 2009"] >>> print a #will print ["velja\xe8a 2009"] >>> Print a[0] #will print veljaèa 2009
"Hrvoje Niksic" <hnik...@xemacs.org> wrote in message news:87ocwzzvym....@mulj.homelinux.net... > "Gabriel Genellina" <gagsl-...@yahoo.com.ar> writes: > >>> I'm playing with os.popen function. >>> a = os.popen("somecmd").read() >>> >>> If one of the lines contains characters like "e", "a"or any other it >>> loks >>> line this "velja\xe8a 2009" with that "\xe8". It prints fine if i go: >>> >>> for i in a: >>> print i: >> >> '\xe8' is a *single* byte (not four). It is the 'LATIN SMALL LETTER E >> WITH GRAVE' Unicode code point u'e' encoded in the Windows-1252 >> encoding (and latin-1, and others too). > > Note that it is also 'LATIN SMALL LETTER C WITH CARON' (U+010D or > u'è'), encoded in Windows-1250, which is what the OP is likely using. > > The rest of your message stands regardless: there is no problem, at > least as long as the OP only prints out the character received from > somecmd to something else that also expects Windows-1250. The problem > would arise if the OP wanted to store the string in a PyGTK label > (which expects UTF8) or send it to a web browser (which expects > explicit encoding, probably defaulting to UTF8), in which case he'd > have to disambiguate whether '\xe8' refers to U+010D or to U+00E8 or > something else entirely. > > That is the problem that Python 3 solves by requiring (or strongly > suggesting) that such disambiguation be performed as early in the > program as possible, preferrably while the characters are being read > from the outside source. A similar approach is possible using Python > 2 and its unicode type, but since the OP never specified exactly which > problem he had (except for the repr/str confusion), it's hard to tell > if using the unicode type would help.
-- http://mail.python.org/mailman/listinfo/python-list