"ygao" <[EMAIL PROTECTED]> wrote: > >>> import sys > >>> sys.setdefaultencoding("utf-8")
hmm. what kind of bootleg python is that ? >>> import sys >>> sys.setdefaultencoding("utf-8") Traceback (most recent call last): File "<stdin>", line 1, in ? AttributeError: 'module' object has no attribute 'setdefaultencoding' (you're not supposed to change the default encoding. don't do that; it'll only cause problems in the long run). > >>> s='\xe9\xab\x98' #this uff-8 string > >>> ss=U'\xe9\xab\x98' > >>> s > '\xe9\xab\x98' > >>> ss > u'\xe9\xab\x98' > >>> > how do I get ss from s? > Can there be a way do this? you have UTF-8 *bytes* in a Unicode text string? sounds like someone's made a mistake earlier on... anyway, iso-8859-1 is, in practice, a null transform, that simply converts unicode characters to bytes: >>> s = ss.encode("iso-8859-1") >>> s '\xe9\xab\x98' >>> s.decode("utf-8") u'\u9ad8' >>> import unicodedata >>> unicodedata.name(s.decode("utf-8")) 'CJK UNIFIED IDEOGRAPH-9AD8' but it's probably better to fix the code that puts UTF-8 data in your Unicode strings (look for bogus iso-8859-1 conversions) </F> -- http://mail.python.org/mailman/listinfo/python-list