Ivan Van Laningham wrote: > > It seems to me that if I want to try to read an unknown file > using an exhaustive list of possible encodings ...
Supposing such a list existed: What do you mean by "unknown file"? That the encoding is unknown? Possibility 1: You are going to try to decode the file from "legacy" to Unicode -- until the first 'success' (defined how?)? But the file could be decoded by *several* codecs into Unicode without an exception being raised. Just a simple example: the encodings ['iso-8859-' + x for x in '12459'] define *all* possible 256 characters. There are various language-guessing algorithms based on e.g. frequency of ngrams ... try Google. Possibility 2: You "know" the file is in a Unicode-encoding e.g. utf-8, have successfully decoded it to Unicode, and are going to try to encode the file in a "legacy" encoding but you don't know which one is appropriate? Sorry, same "But". -- http://mail.python.org/mailman/listinfo/python-list