Well, I did it. Made implementing lazy seqs that require stateful generator functions easy, that is:
(defn files-and-dirs-recursive [dir] (super-lazy-seq [stack [(to-file dir)]] (if-not (empty? stack) (let [file (first stack) s (rest stack)] (next-item file (if (directory? file) (concat s (files-and-dirs file)) s)))))) now works for me. Notice how tidy the body is. The if-not defaults to nil if the stack, initially a vector, is empty, and the body of super- lazy-seq evaluating to nil indicates that the sequence is fully consumed (as when the body of plain lazy-seq evaluates to nil). The body has a similar form to a loop/recur, with a binding at the start, and next-item roughly in the role of recur. It takes one extra parameter, though, the item generated as the next in the sequence. The above has only four external dependencies: super-lazy-seq, to- file, directory?, and files-and-dirs. The latter three are fairly simple: (defn to-file [str-or-file] (if (instance? File str-or-file) str-or-file (File. str-or-file))) (defn exists? [file] (. (to-file file) (isFile))) (defn directory? [file] (. (to-file file) (isDirectory))) (defn files-and-dirs [dir] (seq (. (to-file dir) (listFiles (proxy [FileFilter] [] (accept [file] (or (exists? file) (directory? file)))))))) with the latter returning a (non-lazy) seq of the files and directories in a directory using java.io functionality. The real magic is in the super-lazy-seq macro, which handles all of the plumbing to let you write a lazy-seq generator in the loop/recur style shown above. Keep in mind, this is tested, and works. (defn bindings-to-atoms [bindings] (vec (interleave (take-nth 2 bindings) (map #(identity `(atom ~%)) (take-nth 2 (rest bindings)))))) Helper function that, in our example, changes [stack [(to-file dir)]] into [stack (atom [(to-file dir)])] so that the stack can be changed each iteration, as with loop/recur bindings. (defn map-code [f coll] (if (map? coll) (loop [m {} s coll] (if (empty? s) m (let [e (first s) k (first e) v (second e)] (recur (assoc m (apply f [k]) (apply f [v])) (rest s))))) (let [x (map f coll)] (if (vector? coll) (vec x) (if (set? coll) (set x) x))))) This seems like a huge waste. You're probably thinking "(into (empty coll) (map f coll))". But guess what: that doesn't work. It reverses lists, so you probably think to slap on a reverse. Reverse on a vector produces a list again, so reverse on the into won't work. Reverse on the map does, except that into on a vector DOESN'T reverse it, so any simple construct with into is going to reverse either lists or vectors. It will also fail on maps, though work on sets. On maps, the function will be applied to the key-value pairs. For what we're doing, code transformation, we want the keys and values separately subjected to the map function. The above map-code has no such problem. It still preserves the type of the collection. It preserves the order for both lists and vectors. It processes the keys and values of a map separately. For most other uses you might want a modified version of the above that has (recur (assoc m (apply f [k]) (apply f [v])) (rest s))))) changed to (recur (assoc m k (apply f [v])) (rest s))))) so that it only maps on the values, but we're using this for code massaging, which means the expressions that evaluate to the keys in map literals also need to be processed by the function. (defn with-derefs [names body] (if (coll? body) (map-code #(with-derefs names %) body) (if (and (symbol? body) (contains? names body)) (list 'deref body) body))) And here's the sweet little fn that uses map-code. It changes all occurrences of unqualified "foo" in the passed-in code to "(deref foo)" if "foo" is in the "names" set. This is used to change the body code in super-lazy-seq so that uses of the "loop bindings" dereference the atoms the bindings are actually stored in because of how bindings- to-atoms modifies the bindings so that they can be rebound each time around the loop. (defn make-atom-setter-fn-expr [names] (let [arglist (vec (cons (gensym 'item) (map #(gensym %) names)))] `(~'next-item ~arglist ~@(map (fn [nm gs] (list 'reset! nm gs)) names (rest arglist)) [~(first arglist)]))) This beaut is what binds next-item locally inside the body code in a super-lazy-seq. It creates a list: (next-item [lots-of-gensyms] (reset! name1 gs2) (reset! name2 gs3) (reset! name3 gs4) ... [gs1]) This, if used in a letfn say, will locally bind next-item to a function that takes n + 1 arguments, resets an ordered list of atoms to the second and onwards, and returns the first wrapped in a vector. This, of course, is how we get our recur-like construct that rebinds the loop bindings in super-lazy-seq to their new values. It also does a sneaky thing: makes the body evaluate to nil if it evaluates to nil, but evaluate to a vector of one item (possibly nil) if it ends by returning the result of evaluating (next-item foo bar baz). Notice that we haven't even gotten to the macro yet and we have quite a lot of back-tick and unquote and unquote-splicing going on. The ~'next-item is particularly interesting: it makes sure next-item stays unqualified, because for once we actually do want capture. And now for the pièce de résistance: (defmacro super-lazy-seq [bindings & body] (let [binds (bindings-to-atoms bindings) code (with-derefs (set (take-nth 2 bindings)) body)] `(let ~binds (letfn [~(make-atom-setter-fn-expr (take-nth 2 bindings)) (helper-fn# [] (lazy-seq (let [item# (do ~...@code)] (if item# (cons (first item#) (helper-fn#))))))] (helper-fn#))))) First let is curiously meta: it binds "binds" to something like [var1 (atom expr1) var2 (atom expr2)], i.e. it binds "binds" to a binds- vector in the right format to use in some code. It also binds "code" to a massaged version of the loop body code, altered to deref those same atoms. Then the back-quoted macro result begins, with the make- atom-setter-fn-expr evaluated on the binding names. The modified loop body is actually shoved into a bind itself, inside the lazy-seq body, that assigns what it evaluates to to a gensym, which is then queried. If it's not nil, it is expected to be a one-item vector whose item is used in the cons expression that generates the lazy-seq body's value. The default nil if item# was nil ends the lazy-seq when it occurs. Note that (next-item false foo bar baz) and (next-item nil foo) etc. will correctly put false or nil into the output sequence rather than terminate the sequence; that is why the vector wrap/unwrap. Here's what the macro does to the example at the top of this post: user=> (macroexpand-1 '(super-lazy-seq [stack [(to-file dir)]] (if-not (empty? stack) (let [file (first stack) s (rest stack)] (next-item file (if (directory? file) (concat s (files-and-dirs file)) s)))))) (clojure.core/let [stack (clojure.core/atom [(to-file dir)])] (clojure.core/letfn [(next-item [item15430 stack15431] (reset! stack stack15431) [item15430]) (helper-fn__14936__auto__ [] (clojure.core/ lazy-seq (clojure.core/let [item__14937__auto__ (do (if-not (empty? (deref stack)) (let [file (first (deref stack)) s (rest (deref stack))] (next-item file (if (directory? file) (concat s (files-and- dirs file)) s)))))] (if item__14937__auto__ (clojure.core/cons (clojure.core/first item__14937__auto__) (helper- fn__14936__auto__))))))] (helper-fn__14936__auto__))) That ugly mess, with the clojue.core/ cruft removed and nice indentation, is: (let [stack (atom [(to-file dir)])] (letfn [(next-item [item15430 stack15431] (reset! stack stack15431) [item15430]) (helper-fn__14936__auto__ [] (lazy-seq (let [item__14937__auto__ (do (if-not (empty? (deref stack)) (let [file (first (deref stack)) s (rest (deref stack))] (next-item file (if (directory? file) (concat s (files-and-dirs file)) s)))))] (if item__14937__auto__ (cons (first item__14937__auto__) (helper-fn__14936__auto__))))))] (helper-fn__14936__auto__))) I've separated out the part that comes from the loop body code. It's close in form to my original recursive all-files-and-dirs-as-a-lazy- sequence code. Note that the [stack [(to-file dir)]] bind ends up in a let as [stack (atom [(to-file dir)]]. The next-item function calls reset! stack second-arg and then returns the first arg wrapped in a vector. So the stack is rebound each time around the loop, just like with loop/recur, with super-lazy-seq/next-item. And notice all the (deref stack)s in the body code. Note that the body code includes vector as well as list forms and was still massaged correctly, with (deref stack) appearing twice in a vector -- that's the power of the recursive with-derefs and the collection-type-preserving map-code. The body code could have used set and map literals and stack would have changed to (deref stack) in these, too, even if used in an expression that evaluated to a map *key*. In fact, map-code might be very useful in implementing macros that do serious manipulations of the passed-in code. Here's the original non-super-lazy-seq implementation, for comparison: (defn files-and-dirs-recursive [dir] (let [stack (atom [(to-file dir)])] (letfn [(lfr-helper [] (lazy-seq (if-not (empty? @stack) (cons (let [file (first @stack)] (swap! stack rest) (if (not (directory? file)) file (do (swap! stack concat (files-and-dirs file)) file))) (lfr-helper))))))] (lfr-helper)))) One situation in which super-lazy-seq will fail is when the body internally shadows one of the bindings: (defn files-and-dirs-recursive [dir] (super-lazy-seq [stack [(to-file dir)]] (if-not (empty? stack) (let [file (first stack) stack (rest stack)] (next-item file (if (directory? file) (concat stack (files-and-dirs file)) stack)))))) would have failed with the "stack" in the inner "let" and the "stack"s in the final if turned into "(deref stack)"s, blowing everything up. The with-derefs function isn't smart enough to know about scope. It could not really be made so, either; it could be made to check for let, loop, and a few other common binding situations but not really for arbitrary macros that expand into such. Even with this flaw, super-lazy-seq is useful. It sure did simplify that directory-walking seq-maker. And it also demonstrates the power of clojure macros. A lazy loop/recur, indeed! --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~----------~----~----~----~------~----~------~--~---