[racket-users] [ANN] Racket implementation of magic language

Jonathan Simpson Wed, 31 Jul 2019 18:54:41 -0700

#lang magic is my implementation of the mini language used by the Unix file 
command. I'm aiming for compatibility with Ian Darwin's version 
<https://www.darwinsys.com/file/>, found in most Linux and BSD 
distributions. #lang magic is a work in progress. It is missing a lot of 
functionality but still has enough to be useful.


For the curious, 'man magic' describes the magic language in considerable, 
but not exhaustive, detail. A code sample to check for Microsoft 
executables provides the flavor of the language:

# MS Windows executables are also valid MS-DOS executables
0           string  MZ
>0x18       leshort <0x40
>>(4.s*512) leshort 0x014c  COFF executable (MS-DOS, DJGPP)
>>(4.s*512) leshort !0x014c MZ executable (MS-DOS)
# skip the whole block below if it is not an extended executable
>0x18       leshort >0x3f
>>(0x3c.l)  string  PE\0\0  PE executable (MS-Windows)
>>>&0       leshort 0x14c   for Intel 80386
>>>&0       leshort 0x184   for DEC Alpha
>>>&0       leshort 0x8664  for AMD64
>>(0x3c.l)  string  LX\0\0  LX executable (OS/2)

The code sample above compiles to one magic query. A #lang magic Racket 
module consists of 1 or more such queries. New queries start on a line 
without a preceding '>'. Every #lang magic module provides two functions: 
magic-query and magic-query-run-all. These functions are thunks that can be 
passed to with-input-from-file to test the file against the queries in the 
module. magic-query replicates the default behavior of the file command. It 
stops and returns true after the first query match or returns false if no 
queries pass. A query is matched if the test on the first line of the query 
succeeds. A matched query will run until completion, but even if later 
tests in the query fail, the query is still considered a match if the first 
test passes. magic-query-run-all, on the other hand, will always test the 
file against every query in the module. Both functions print the messages 
for each successful test to the current output port. This is a brief 
summary, so consult 'man magic' for a complete explanation.

I wrote #lang magic to use in my gopher server. Gopher is a simple TCP 
protocol for exchanging documents. Gopher directories have simple one 
character flags to indicate file type. I wanted something that would be 
more robust than simply relying on file extensions. Here's an example of 
how I call into #lang magic to do that:

(require (only-in "magic/image.rkt" (magic-query image-query)))
(require (only-in "magic/gif.rkt" (magic-query gif-query)))
(require (only-in "magic/html.rkt" (magic-query html-query)))

(define (filetype path)
  (define extension (filename-extension path))

  (cond [(directory-exists? path) "1"]
        [(with-input-from-file path image-query) "I"]
        [(with-input-from-file path gif-query) "g"]
        [(with-input-from-file path html-query) "h"]
        [(is-utf8-text? path) "0"]
        [extension
         (cond [(or (bytes=? extension #"txt")
                    (bytes=? extension #"conf")
                    (bytes=? extension #"cfg")
                    (bytes=? extension #"sh")
                    (bytes=? extension #"bat")
                    (bytes=? extension #"ini"))"0"]
               [(or (bytes=? extension #"wav")
                    (bytes=? extension #"ogg")
                    (bytes=? extension #"mp3")) "s"]
               [else "9"])]
        [else "9"]))

The require'd .rkt files are written in #lang magic. I've kept the 
extension check as a fallback for now. Eventually I will add additional 
magic to detect audio files and otherfile types supported by gopher.

I still have a lot of work to do. This is my first project using scheme 
macros, much less Racket's language building facilities, so I'm sure my 
code is far from optimal. Most of the macros in my current code need 
revising and I'd like to rewrite most of them with syntax/parse. I know my 
lexer could be improved as well. One reason I'm making the code public now 
is to gather feedback and advice for improvement.

I'm currently running Racket 6.11 on Linux, so that is the only platform 
I've tested it on. I plan to test on Windows and a current version of 
Racket in the near future. Please let me know if you have problems using 
this on another platform. If I know it doesn't work for someone, it will 
push up the priority of testing on other platforms.

I couldn't have gotten as far as I have without the generous help of 
members of this group. Many thanks to everyone who has helped me directly 
or contributed in any way to official or unofficial Racket documentation. I 
even owe the genesis of this project to this group, this thread 
<https://groups.google.com/d/topic/racket-users/wZMkXk33XxQ/discussion> in 
particular.

I'd love to know if anyone else has a use for this. So please post here if 
you do! I will happily accept any suggestions, ideas, or feedback.

-- Jonathan

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/racket-users/f1945939-3059-4fb1-b736-4dac213d46cf%40googlegroups.com.

[racket-users] [ANN] Racket implementation of magic language

Reply via email to