I've been trying to make a tokenizer/lexer for a project of mine and came up with the following code, I've modelled the stream of characters as seq/lazy of chars which is then converted to a lazy-seq of token objects. I'm relatively happy with how idiomatic and functional the code seems, however when benchmarked, the code takes about 30 seconds on clojure (after i increase the heap to 1 gig) to process a 30 meg file, and over 1 minute 30 seconds with clojurescript. This is in contrast to about of 0.1 to 0.5 seconds or less in C. Is there any idiomatic way to process the file without being a factor of 100 times slower than C?
Also, is there a tool for clojure similar to gprof for C? Each function takes in a char seq and returns both a token and the seq after its been advanced. (defn match-ident [cs] (let [start (first cs)] (if (ident-first-char? start) (let [ identseq (cons start (take-while ident-tail-char? (rest cs))) ^String ident (apply str identseq)] [(drop (.length ident) cs) [:ident ident]])))) (defn match-num [cs] (if (digit? (first cs)) (let [ numseq (take-while digit? cs) ^String numstr (apply str numseq) retseq (drop (.length numstr) cs)] (if (= (first retseq) \.) nil [retseq [:number numstr]])))) (defn match-ws [cs] (if (whitespace-char? (first cs)) (let [ wsseq (take-while whitespace-char? cs) ^String wsstr (apply str wsseq) retseq (drop (.length wsstr) cs)] [retseq [:ws wsstr]]))) ... (defn next-token [cs] (or (match-ident cs) (match-ws cs) (match-punct cs) (match-num cs) (match-eof cs) (match-unknown cs))) ;; Here I build the lazy seq of tokens. (defn token-seq [cs] (let [[newcs tok] (next-token cs)] (lazy-seq (cons tok (token-seq newcs))))) Cheers, Andrew Chambers -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.