In <7slr5ife6...@mid.uni-berlin.de> "Diez B. Roggisch" <de...@nospam.web.de> writes:
>Am 31.01.10 16:52, schrieb kj: >> I want to pass Chinese characters as command-line arguments to a >> Python script. My terminal has no problem displaying these >> characters, and passing them to the script, but I can't get Python >> to understand them properly. >> >> E.g. if I pass one such character to the simple script >> >> import sys >> print sys.argv[1] >> print type(sys.argv[1]) >> >> the first line of the output looks fine (identical to the input), >> but the second line says "<type 'str'>". If I add the line >> >> arg = unicode(sys.argv[1]) >> >> I get the error >> >> Traceback (most recent call last): >> File "kgrep.py", line 4, in<module> >> arg = unicode(sys.argv[1]) >> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe8 in position 0: >> ordinal not in range(128) >> >> What must I do to get Python to recognize command-line arguments >> as utf-8 Unicode? >The last sentence reveals your problem: utf-8 is *not* unicode. It's an >encoding of unicode, which is a crucial difference. > From the outside you get byte-streams, and if these happen to be >encoded in utf-8, you can simply decode them: >arg = unicode(sys.argv[1], "utf-8") Thanks! kynn -- http://mail.python.org/mailman/listinfo/python-list