ygao wrote: > I must use utf-8 for chinese. Sure. But please don't do that:
>>>> import sys >>>> reload(sys) >>>> sys.setdefaultencoding("utf-8") As Fredrik says, you should really avoid changing the default encoding. >>>> s='\xe9\xab\x98' #this uff-8 string >>>> ss=U'\xe9\xab\x98' >>>> ss1=ss.encode('unicode_escape').decode('string_escape') >>>> s1=s.decode('unicode_escape') >>>> s1==ss > True >>>> ss1==s > True Ok. But how about that: py> s='\xe9\xab\x98' py> ss=u'\u9ad8' py> s1=s.decode('utf-8') py> s1==ss True Here, ss is a single character, which uses 3 bytes in UTF-8. In your example, ss has three characters, which are not Chinese, but European. Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list