a...@pythoncraft.com (Aahz) writes: > In article <mailman.2434.1265983307.28905.python-l...@python.org>, > Lloyd Zusman <l...@asfast.com> wrote: > >if (-T $filename) { print "file contains 'text' characters\n"; } > >if (-B $filename) { print "file contains 'binary' characters\n"; } > > Assuming you're on a Unix-like system or can install Cygwin, the > standard response is to use the "file" command. It's *much* more > sophisticated.
Indeed, the ‘file’ command is an expected (though not standard) part of most Unix systems, and its use frees us from the lies of filenames about their contents. The sophistication comes from an extensive database of heuristics — filesystem attributes, “magic” content signatures, and parsing — that are together known as the “magic database”. This database is maintained along with the ‘file’ program, and made accessible through a C library from the same code base called ‘magic’. So, you don't actually need to use the ‘file’ command to access this sophistication. Just about every program on a GNU system that wants to display file types, such as the graphical file manager, will query the ‘magic’ library directly to get the file type. The ‘file’ code base has for a while now also included a Python interface to this library, importable as the module name ‘magic’. Unfortunately it isn't registered at PyPI as far as I can tell. (There are several project using the name “magic” that implement something similar, but they are nowhere near as sophisticated.) On a Debian GNU/Linux system, you can install the ‘python-magic’ package to get this installed. Otherwise, you can build it from the ‘file’ code base <URL:http://www.darwinsys.com/file/>. -- \ “I don't accept the currently fashionable assertion that any | `\ view is automatically as worthy of respect as any equal and | _o__) opposite view.” —Douglas Adams | Ben Finney -- http://mail.python.org/mailman/listinfo/python-list