On 01/28/2012 12:21 AM, contro opinion wrote: >>>> s='你好' On my computer, s is a byte string that contains the utf-8 formatted encoding of 你好. This has nothing to do with python, though, and everything to do with the line editor python's interpreter is doing. In other words, the string is encoded to utf-8 before python even sees it.
So in this instance to convert s to a proper unicode string instead of a utf-8-encoded byte string, you do: us = s.decode('utf-8') # The encoding of s probably depends on your terminal shell's encoding system. Mine is utf-8, so that's what s ends up encoded as. This is confusing isn't it. You are dealing with several things together. 1. The terminal's character set, 2. the python interpreter's line editor (which is readline on my computer), and 3. python itself. In cases where the script is run directly by the python interpreter, you can specify the encoding of the python file at the beginning of the file in a comment. http://www.python.org/dev/peps/pep-0263/ I think that most text editors will probably use utf-8 by default, so the string: s = '你好' when looked at with a hex editor would be converted to utf-8 already. s = '\xc4\xe3\xba\xc3' >>>> t=u'你好' >>>> s > '\xc4\xe3\xba\xc3' The result of these two lines is going to be different depending on your terminal encoding scheme and the line editor. As I said before, the bytestring that s is assigned to is determined not by python in this case, but by the editor and terminal. >>>> t > u'\u4f60\u597d' >>>> t=us > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > NameError: name 'us' is not defined Of course. There is no variable called 'us' >>>> > how can i use us to express u'你好'?? Provided your python file, terminal, and editor all agree on the text encoding: s = u'你好' Python normally uses whatever is set in the environment, which on my computer is en_US.UTF-8, hence utf-8. Could be different on your computer. or s = u'\u4f60\u597d' > can i add someting in us to express u'你好'?? That works directly on my terminal. Unicode is definitely a challenge. Python 3 makes it easier by defaulting to unicode internally. But you still have the challenge of making sure your python source file is encoded in the proper encoding (normally utf-8). -- http://mail.python.org/mailman/listinfo/python-list