prats wrote: > sorry I did not correctly read your point. I works fine. Thanks for > your help. > I have one more query. It was said that the text I was supposed to show > was written using "ISO-2022-JP" charset. But It didn't when I decoded > it using that charset. But it worked fine with the "shift-jis" > encoding. Is it the default charset used by python i.e. I mean to say > bytes would be by default "shift-jis"?
No, the default charset in python is ascii. There is no absolutely reliable way to find out the encoding of arbitrary bytes. But if you have more than ten bytes and you know some properties of the text (like you're sure your text contains only English and Japanese) then the first thing you can do is to rule out invalid encodings: def valid_en_jp_encodings(bytes): try: bytes.decode("ascii") return ["ascii"] except UnicodeDecodeError: pass encodings = "utf-8", "shift-jis", "iso-2022-jp", "euc-jp" valid = [] for encoding in encodings: try: bytes.decode(encoding) valid.append(encoding) except UnicodeDecodeError: pass return valid If this function returns a list with only one item you're lucky. If it returns more than one item things are getting more complicated. You can try to use http://chardet.feedparser.org/ to guess encoding or you can present list of valid encodings to the user and let him/her make a choice. There is also possibility that this function returns an empty list, you will need to display a error message in such case. -- http://mail.python.org/mailman/listinfo/python-list