File have China Made 中國 製 http://www.fileformat.info/info/unicode/char/4e2d/index.htm UTF-16 (hex) 0x4E2D (4e2d) UTF-8 (hex) 0xE4 0xB8 0xAD (e4b8ad)
Read by od -cx utf_a.text 0000000 中 ** ** 國 ** ** 製 ** ** \n e4b8 ade5 9c8b e8a3 bd0a 0000012 Read by python, why python display as beow ? 中國製 u'\u4e2d\u570b\u88fd\n' <--- Value 中國製 <-- UTF-8 value u'\u4e2d' 中 CJK UNIFIED IDEOGRAPH-4E2D u'\u570b' 國 CJK UNIFIED IDEOGRAPH-570B u'\u88fd' 製 CJK UNIFIED IDEOGRAPH-88FD import unicodedata import codecs # UNICODE .... file = codecs.open(options.filename, 'r','utf-8' ) try: for line in file: #print repr(line) #print "=========" print line.encode("utf") for keys in line.split(","): print repr(keys) ," <--- Value" , keys.encode("utf") ,"<-- UTF-8 value" for key in keys: try: name = unicodedata.name(unicode(key)) print "%-9s %-8s %-30s" % ( (repr(key)), key.encode("utf") , name ) How to display e4b8ad for 中 in python ? -- http://mail.python.org/mailman/listinfo/python-list