On Wed, Jun 4, 2014 at 2:34 AM, Steven D'Aprano <steve+comp.lang.pyt...@pearwood.info> wrote: > Outside of those three kinds of files, I would expect that *by far* the > single largest kind of file is text. Some text is wrapped in a binary > layer, e.g. .doc, .odt, etc. but an awful lot of it is good old human > readable text, including web pages (html) and XML.
In terms of file I/O in Python, text wrapped in a binary layer has to be treated as binary, not text. There's no difference between a JPEG file that has some textual EXIF information and an ODT file that's a whole lot of zipped up text; both of them have to be read as binary, then unpacked according to the container's specs, and then the text portion decoded according to an encoding like UTF-8. But you're quite right that a large proportion of files out there really are text. ChrisA -- https://mail.python.org/mailman/listinfo/python-list