On 20/12/10 17:53, Alec Battles wrote:
I seem to remember that 'file' in Linux detects encodings, but it's
also a matter of calling it by the exact same name...
There is no foolproof way of detecting encoding unfortunately - you just
need to know what it is before you read the file.
That's interesting. I wonder if there's a mathematical proof of the
'undecidability' of text encodings.
Hofstadter describes the problem in Godel, Escher, Bach as the "Envelope
Problem" IIRC - you need to have some idea of how to decode any message
you are sent, and you even need to understand that it is a "message".
UNIX manages the latter for us by providing a filename - but how to
interpret the contents is entirely up to you. It might be UTF-8, it
might be a jpeg, it might be encrypted using AES. You need to know what
to expect to try and interpret the contents.
I bet there is a name for this (although probably not a proof), but I
don't know what it is ;)
Cheers,
Doug.
--
Telephone: +44 1904 567330, Mobile: +44 7879 423002
Switchboard: +44 1904 567349, Fax: +44 20 79006980
Post: Tower House, Fishergate, York, YO10 4UA, UK
Registered in England. Company No 5171172. VAT GB843570325.
Regd Office: 3&4 Park Court, Riccall Road, Escrick, York, YO19 6ED
_______________________________________________
python-uk mailing list
python-uk@python.org
http://mail.python.org/mailman/listinfo/python-uk