Ross Ridge wrote: > [EMAIL PROTECTED] wrote: > >> try: >> (uni, dummy) = utf8dec(s) >> except: >> (uni, dummy) = iso88591dec(s, 'ignore') > > > Is there really any point in even trying to decode with UTF-8? You > might as well just assume ISO 8859-1.
The point is that you can tell UTF-8 reliably. If the data decodes as UTF-8, it *is* UTF-8, because no other encoding in the world produces the same byte sequences (except for ASCII, which is an UTF-8 subset). So if it is not UTF-8, the guessing starts. Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list