Erik Max Francis wrote: > The problem is that there are endless ways to do that, and figuring out > all the cases makes `file` an sh interpreter, not the magic number > detector it's supposed to be.
It makes it into a pattern-matcher, not an interpreter. But that it is already. But right, there are endless ways to do that, but only one or a very small subset is common. The way I cited is the way mentioned in the Tclsh-manpage, so it can be (not must, but can) as the standard-header of a Tcl-Script on Unix-like systems. Even if the construct sometimes differs a little bit, "file" should be able to identify, since the manpage of "file" says, that it is not a pure magic-number-detector. The first part explains how the magic-number-thing works and then what is done, when that fails: "If a file does not match any of the entries in the magic file, it is examined to see if it seems to be a text file. [...] Once file has determined the character set used in a text-type file, it will attempt to determine in what language the file is written. The language tests look for particular strings (cf names.h) that can appear anywhere in the first few blocks of a file. For example, the keyword .br indicates that the file is most likely a troff(1) input file, just as the keyword struct indicates a C program. These tests are less reliable than the previous two groups, so they are performed last." This is not the most reliable way, as the man-page says, but it should work: if in the first some blocks you can find a statement with a continued comment-line and the exec-trick, "file" can at least guess, that it is a Tcl-file. So this would be a valid method to detect that files type, because a troff-file is a troff file, even when there is no .br in the first few blocks, but "file" tries to identify it anyway. "file" is not meant to be to ignore troff files, just because they are sometimes hard to detect. The same applies to Tcl-files, I think. Not perfectly reliable, but worth a try, since it is, according to the Tclsh-manpage, the common header-pattern for a Tcl-script. So "file" can not be perfect and reliable in every case, but it should try to take a good guess. If you do not care about the Tcl-headers (and why should you, this is comp.lang.python... ;-), you are right with your reasoning. But if you accept, that file can not be perfect anyway and want it to be as good as possible, then it is some kind of bug or missing feature in "file" that it recognizes (or tries to) some morphing file formats but not another (which is fairly wide spread, even if Tcl is not a modern buzz-word-language these days). Stephan -- http://mail.python.org/mailman/listinfo/python-list