Re: Codecs

John Machin Sun, 10 Jul 2005 17:30:29 -0700

Ivan Van Laningham wrote:
> 
> It seems to me that if I want to try to read an unknown file
> using an exhaustive list of possible encodings ...



Supposing such a list existed:

What do you mean by "unknown file"? That the encoding is unknown?

Possibility 1:
You are going to try to decode the file from "legacy" to Unicode -- 
until the first 'success' (defined how?)? But the file could be decoded 
by *several* codecs into Unicode without an exception being raised. Just 
a simple example: the encodings ['iso-8859-' + x for x in '12459'] 
define *all* possible 256 characters.

There are various language-guessing algorithms based on e.g. frequency 
of ngrams ... try Google.

Possibility 2:
You "know" the file is in a Unicode-encoding e.g. utf-8, have 
successfully decoded it to Unicode, and are going to try to encode the 
file in a "legacy" encoding but you don't know which one is appropriate?
Sorry, same "But".



-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Codecs

Reply via email to