Re: os.popen encoding!

Hrvoje Niksic Wed, 18 Feb 2009 06:10:41 -0800

"Gabriel Genellina" <[email protected]> writes:

>> I'm playing with os.popen function.
>> a = os.popen("somecmd").read()
>>
>> If one of the lines contains characters like "è", "æ"or any other it loks
>> line this "velja\xe8a 2009" with that "\xe8". It prints fine if i go:
>>
>> for i in a:
>>     print i:
>
> '\xe8' is a *single* byte (not four). It is the 'LATIN SMALL LETTER E
> WITH  GRAVE' Unicode code point u'è' encoded in the Windows-1252
> encoding (and  latin-1, and others too).


Note that it is also 'LATIN SMALL LETTER C WITH CARON' (U+010D or
u'č'), encoded in Windows-1250, which is what the OP is likely using.

The rest of your message stands regardless: there is no problem, at
least as long as the OP only prints out the character received from
somecmd to something else that also expects Windows-1250.  The problem
would arise if the OP wanted to store the string in a PyGTK label
(which expects UTF8) or send it to a web browser (which expects
explicit encoding, probably defaulting to UTF8), in which case he'd
have to disambiguate whether '\xe8' refers to U+010D or to U+00E8 or
something else entirely.

That is the problem that Python 3 solves by requiring (or strongly
suggesting) that such disambiguation be performed as early in the
program as possible, preferrably while the characters are being read
from the outside source.  A similar approach is possible using Python
2 and its unicode type, but since the OP never specified exactly which
problem he had (except for the repr/str confusion), it's hard to tell
if using the unicode type would help.
--
http://mail.python.org/mailman/listinfo/python-list

Re: os.popen encoding!

Reply via email to