Well, I did it. Made implementing lazy seqs that require stateful
generator functions easy, that is:

(defn files-and-dirs-recursive [dir]
  (super-lazy-seq [stack [(to-file dir)]]
    (if-not (empty? stack)
      (let [file (first stack) s (rest stack)]
        (next-item
          file
          (if (directory? file)
            (concat s (files-and-dirs file))
            s))))))

now works for me. Notice how tidy the body is. The if-not defaults to
nil if the stack, initially a vector, is empty, and the body of super-
lazy-seq evaluating to nil indicates that the sequence is fully
consumed (as when the body of plain lazy-seq evaluates to nil). The
body has a similar form to a loop/recur, with a binding at the start,
and next-item roughly in the role of recur. It takes one extra
parameter, though, the item generated as the next in the sequence.

The above has only four external dependencies: super-lazy-seq, to-
file, directory?, and files-and-dirs. The latter three are fairly
simple:

(defn to-file [str-or-file]
  (if (instance? File str-or-file) str-or-file (File. str-or-file)))

(defn exists? [file]
  (. (to-file file) (isFile)))

(defn directory? [file]
  (. (to-file file) (isDirectory)))

(defn files-and-dirs [dir]
  (seq (. (to-file dir)
         (listFiles (proxy [FileFilter] []
                      (accept [file] (or
                                       (exists? file)
                                       (directory? file))))))))

with the latter returning a (non-lazy) seq of the files and
directories in a directory using java.io functionality.

The real magic is in the super-lazy-seq macro, which handles all of
the plumbing to let you write a lazy-seq generator in the loop/recur
style shown above. Keep in mind, this is tested, and works.

(defn bindings-to-atoms [bindings]
  (vec
    (interleave
      (take-nth 2 bindings)
      (map #(identity `(atom ~%)) (take-nth 2 (rest bindings))))))

Helper function that, in our example, changes [stack [(to-file dir)]]
into [stack (atom [(to-file dir)])] so that the stack can be changed
each iteration, as with loop/recur bindings.

(defn map-code [f coll]
  (if (map? coll)
    (loop [m {} s coll]
      (if (empty? s)
        m
        (let [e (first s) k (first e) v (second e)]
          (recur (assoc m (apply f [k]) (apply f [v])) (rest s)))))
    (let [x (map f coll)]
      (if (vector? coll)
        (vec x)
        (if (set? coll)
          (set x)
          x)))))

This seems like a huge waste. You're probably thinking "(into (empty
coll) (map f coll))". But guess what: that doesn't work. It reverses
lists, so you probably think to slap on a reverse. Reverse on a vector
produces a list again, so reverse on the into won't work. Reverse on
the map does, except that into on a vector DOESN'T reverse it, so any
simple construct with into is going to reverse either lists or
vectors. It will also fail on maps, though work on sets. On maps, the
function will be applied to the key-value pairs. For what we're doing,
code transformation, we want the keys and values separately subjected
to the map function.

The above map-code has no such problem. It still preserves the type of
the collection. It preserves the order for both lists and vectors. It
processes the keys and values of a map separately. For most other uses
you might want a modified version of the above that has

          (recur (assoc m (apply f [k]) (apply f [v])) (rest s)))))

changed to

          (recur (assoc m k (apply f [v])) (rest s)))))

so that it only maps on the values, but we're using this for code
massaging, which means the expressions that evaluate to the keys in
map literals also need to be processed by the function.

(defn with-derefs [names body]
  (if (coll? body)
    (map-code #(with-derefs names %) body)
    (if (and (symbol? body) (contains? names body))
      (list 'deref body)
      body)))

And here's the sweet little fn that uses map-code. It changes all
occurrences of unqualified "foo" in the passed-in code to "(deref
foo)" if "foo" is in the "names" set. This is used to change the body
code in super-lazy-seq so that uses of the "loop bindings" dereference
the atoms the bindings are actually stored in because of how bindings-
to-atoms modifies the bindings so that they can be rebound each time
around the loop.

(defn make-atom-setter-fn-expr [names]
  (let [arglist (vec (cons (gensym 'item) (map #(gensym %) names)))]
    `(~'next-item
      ~arglist
      ~@(map (fn [nm gs] (list 'reset! nm gs)) names (rest arglist))
      [~(first arglist)])))

This beaut is what binds next-item locally inside the body code in a
super-lazy-seq. It creates a list:

(next-item [lots-of-gensyms] (reset! name1 gs2) (reset! name2 gs3)
(reset! name3 gs4) ... [gs1])

This, if used in a letfn say, will locally bind next-item to a
function that takes n + 1 arguments, resets an ordered list of atoms
to the second and onwards, and returns the first wrapped in a vector.
This, of course, is how we get our recur-like construct that rebinds
the loop bindings in super-lazy-seq to their new values. It also does
a sneaky thing: makes the body evaluate to nil if it evaluates to nil,
but evaluate to a vector of one item (possibly nil) if it ends by
returning the result of evaluating (next-item foo bar baz).

Notice that we haven't even gotten to the macro yet and we have quite
a lot of back-tick and unquote and unquote-splicing going on. The
~'next-item is particularly interesting: it makes sure next-item stays
unqualified, because for once we actually do want capture.

And now for the pièce de résistance:

(defmacro super-lazy-seq [bindings & body]
  (let [binds (bindings-to-atoms bindings)
        code (with-derefs (set (take-nth 2 bindings)) body)]
    `(let ~binds
       (letfn [~(make-atom-setter-fn-expr (take-nth 2 bindings))
               (helper-fn# []
                 (lazy-seq
                   (let [item# (do ~...@code)]
                     (if item#
                       (cons (first item#) (helper-fn#))))))]
         (helper-fn#)))))

First let is curiously meta: it binds "binds" to something like [var1
(atom expr1) var2 (atom expr2)], i.e. it binds "binds" to a binds-
vector in the right format to use in some code. It also binds "code"
to a massaged version of the loop body code, altered to deref those
same atoms. Then the back-quoted macro result begins, with the make-
atom-setter-fn-expr evaluated on the binding names. The modified loop
body is actually shoved into a bind itself, inside the lazy-seq body,
that assigns what it evaluates to to a gensym, which is then queried.
If it's not nil, it is expected to be a one-item vector whose item is
used in the cons expression that generates the lazy-seq body's value.
The default nil if item# was nil ends the lazy-seq when it occurs.
Note that (next-item false foo bar baz) and (next-item nil foo) etc.
will correctly put false or nil into the output sequence rather than
terminate the sequence; that is why the vector wrap/unwrap.

Here's what the macro does to the example at the top of this post:

user=> (macroexpand-1 '(super-lazy-seq [stack [(to-file dir)]]
    (if-not (empty? stack)
      (let [file (first stack) s (rest stack)]
        (next-item
          file
          (if (directory? file)
            (concat s (files-and-dirs file))
            s))))))
(clojure.core/let [stack (clojure.core/atom [(to-file dir)])]
(clojure.core/letfn [(next-item [item15430 stack15431] (reset! stack
stack15431) [item15430]) (helper-fn__14936__auto__ [] (clojure.core/
lazy-seq (clojure.core/let [item__14937__auto__ (do (if-not (empty?
(deref stack)) (let [file (first (deref stack)) s (rest (deref
stack))] (next-item file (if (directory? file) (concat s (files-and-
dirs file)) s)))))] (if item__14937__auto__ (clojure.core/cons
(clojure.core/first item__14937__auto__) (helper-
fn__14936__auto__))))))] (helper-fn__14936__auto__)))

That ugly mess, with the clojue.core/ cruft removed and nice
indentation, is:

(let [stack (atom [(to-file dir)])]
  (letfn [(next-item [item15430 stack15431]
            (reset! stack stack15431)
            [item15430])
          (helper-fn__14936__auto__ []
            (lazy-seq
              (let [item__14937__auto__

(do
  (if-not (empty? (deref stack))
    (let [file (first (deref stack)) s (rest (deref stack))]
      (next-item file
        (if (directory? file)
          (concat s (files-and-dirs file))
          s)))))]

                (if item__14937__auto__
                  (cons
                    (first item__14937__auto__)
                    (helper-fn__14936__auto__))))))]
     (helper-fn__14936__auto__)))

I've separated out the part that comes from the loop body code. It's
close in form to my original recursive all-files-and-dirs-as-a-lazy-
sequence code. Note that the [stack [(to-file dir)]] bind ends up in a
let as [stack (atom [(to-file dir)]]. The next-item function calls
reset! stack second-arg and then returns the first arg wrapped in a
vector. So the stack is rebound each time around the loop, just like
with loop/recur, with super-lazy-seq/next-item. And notice all the
(deref stack)s in the body code. Note that the body code includes
vector as well as list forms and was still massaged correctly, with
(deref stack) appearing twice in a vector -- that's the power of the
recursive with-derefs and the collection-type-preserving map-code. The
body code could have used set and map literals and stack would have
changed to (deref stack) in these, too, even if used in an expression
that evaluated to a map *key*. In fact, map-code might be very useful
in implementing macros that do serious manipulations of the passed-in
code.

Here's the original non-super-lazy-seq implementation, for comparison:

(defn files-and-dirs-recursive [dir]
  (let [stack (atom [(to-file dir)])]
    (letfn [(lfr-helper []
             (lazy-seq
               (if-not (empty? @stack)
                 (cons
                   (let [file (first @stack)]
                     (swap! stack rest)
                     (if (not (directory? file))
                       file
                       (do
                         (swap! stack concat (files-and-dirs file))
                         file)))
                   (lfr-helper))))))]
      (lfr-helper))))

One situation in which super-lazy-seq will fail is when the body
internally shadows one of the bindings:

(defn files-and-dirs-recursive [dir]
  (super-lazy-seq [stack [(to-file dir)]]
    (if-not (empty? stack)
      (let [file (first stack) stack (rest stack)]
        (next-item
          file
          (if (directory? file)
            (concat stack (files-and-dirs file))
            stack))))))

would have failed with the "stack" in the inner "let" and the "stack"s
in the final if turned into "(deref stack)"s, blowing everything up.
The with-derefs function isn't smart enough to know about scope. It
could not really be made so, either; it could be made to check for
let, loop, and a few other common binding situations but not really
for arbitrary macros that expand into such.

Even with this flaw, super-lazy-seq is useful. It sure did simplify
that directory-walking seq-maker. And it also demonstrates the power
of clojure macros. A lazy loop/recur, indeed!
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to