Mentre io pensavo ad una intro simpatica "Michal" scriveva: > Hello, > is there any way how to detect string encoding in Python? > I need to proccess several files. Each of them could be encoded in > different charset (iso-8859-2, cp1250, etc). I want to detect it, and > encode it to utf-8 (with string function encode). > Thank you for any answer
Hi, As you already heard you can't be sure but you can guess. I use a method like this: def guess_encoding(text): for best_enc in guess_list: try: unicode(text,best_enc,"strict") except: pass else: break return best_enc 'guess_list' is an ordered charset name list like this: ['us-ascii','iso-8859-1','iso-8859-2',...,'windows-1250','windows-1252'...] of course you can remove charsets you are sure you'll never find. -- Questa potrebbe davvero essere la scintilla che fa traboccare la goccia. |\ | |HomePage : http://nem01.altervista.org | \|emesis |XPN (my nr): http://xpn.altervista.org -- http://mail.python.org/mailman/listinfo/python-list