Re: Questions & workaround regarding Clojure, true laziness, and nil-terminated sequences

Chouser Sat, 03 Jan 2009 22:22:05 -0800

On Wed, Dec 24, 2008 at 2:45 AM, Jason <jawo...@berkeley.edu> wrote:
>
> I've been trying to incorporate lazy seqs into my project, but kept
> running into troubles where things were evaluated that I didn't think
> should be.  After thinking a bit and looking into the Clojure source
> for the first time, I think I've realized the source of the issue: the
> empty seq == nil.  This means that every "lazy" function or macro that
> returns a seq -- for, lazy-cat, filter, etc. -- must always do enough
> pre-emptive evaluation to determine whether the return value should be
> nil or an object.   So, my first question is, is this analysis
> correct?  If not, please enlighten me (and either way, maybe consider
> devoting a FAQ question to this...).


First, I'd like to point out that it seems very rare that one needs to
understand lazy seqs to the level of detail discussed in the rest of
this post.  Generally, built in functions don't consume any more than
they need, and in fact the most natural implementations of many
user-created functions don't either.  When functions do take more from
a seq than strictly necessary, this cost is hardly ever high enough to
matter.

That said, what you're suggesting goes to the heart of the seq
abstraction, and so I thought would be worth my time to try to
understand exactly what's going on.  I find it a bit difficult to
express the concepts involved -- I don't know if this a failure of
vocabulary or general fogginess in my thinking, but I'll try to get
across what I've found.  Please excuse the sluggish pace and
verbosity.

Consider a lazy seq that would print as:

(alice bob)

It is true that in order to get the 'rest' of the above seq, you'd
have to determine if there is a "bob" vs. that "alice" is the end of
the seq.  But you do NOT have to compute the value of "bob".  The fact
that both the 'first' and 'rest' parts of 'lazy-cons' are delayed is
crucial.

To help experiment with these concepts I wrote a function that reports
each step required of a seq you give it:

(defn report-seq [msg coll]
  (lazy-cons
    (do (println "(first" msg (first coll) ")") (first coll))
    (do (println "(rest after" msg (first coll) ")")
      (when (rest coll)
        (report-seq msg (rest coll))))))

The function can be used like this:

user=> (empty? (rest (report-seq "friend" '(alice bob))))
(first friend alice )
(rest after friend alice )
false

This shows that using the normal builtin seq functions 'rest' and
'empty?', a minimal amount of the seq is consumed.  "alice" is
computed and cached.  Then the 'rest' is called by our expression, and
this is report as "(rest after friend alice)"  But the value "bob" is
not required in order for 'empty?' to make its determination, so "bob"
is not computed and thus no "(first friend bob)" is seen.

> Finally, the workaround.  A little background: I was trying to build a
> lazy stack data structure (where entire lazy seqs, incidentally
> involving a filter or (for :when) could be pushed on the front), and
> found that elements kept being unexpectedly evaluated.  In particular,
> any element ever at the top of the stack was always evaluated, even if
> it was never popped and was subsequently buried by other elements.

Without seeing your implementation I can't know exactly what was
causing your extra seq consumption.  In order to try to understand your
dilemma, I created my own stack API based on your description.  Here's
what I came up with:

(defn single-pop [mstk]
  (if (rfirst mstk)
    (cons (rfirst mstk) (rest mstk))
    (rest mstk)))

(defn single-peek [mstk]
  (ffirst mstk))

(defn push-seq [mstk coll]
  (if (empty? coll)
    mstk
    (cons coll mstk)))

A populated mstk might look like: ((5) (4 3) (2 1))
An empty mstk is just an empty list or nil.  These functions guarantee
that no empty list or nil is ever in the mstk, so 'single-peek' can be
guaranteed to find the next item right where it expects it.

Now lets put a report-seq on an mstk and see when things get
evaluated:

(defn test-stack []
  (-> ()
    (push-seq (report-seq "" [2 1]))
    (push-seq [3])))

We would hope that doing a single-peek on a test-stack would return 3.
If we pop nothing, we really shouldn't see anything demanded from the
report-seq.

user=> (-> (test-stack) (single-peek))
3

Good enough, but that doesn't prove much.  So let's pop once, then
peek.  This should return the number 2, so we should see a report that
it had to be produced:

user=> (-> (test-stack) (single-pop) (single-peek))
(first  2 )
2

And indeed we do.  Note that the "rest after 2" was not reported,
which is good.  However it could be that "first 2" was demanded too
early, the moment that 2 was at the top of the stack, instead of
waiting until 'single-peek' was called.  To prove this was not that
case, we can pop the 3 off (causing 2 to be at the top of the stack)
and then push something else on instead.

user=> (-> (test-stack) (single-pop) (push-seq [9]) (single-peek))
9

We know 2 must have been at the top of the stack, but none of the
functions involved caused it to be computed.  Just to make sure this
isn't an edge-case, let's pop one level deeper.

user=> (-> (test-stack) (single-pop) (push-seq [9]) (single-pop)
(single-pop) (single-peek))
(first  2 )
(rest after  2 )
(first  1 )
1

Popping off the 2 and then doing a peek of course requires that 1 be
computed.  But again note that no extra work was done to determine if
1 was the end of the seq or not.  Similarly, if we don't actually ask
for the value at the top of the stack, but simply ask if the seq is
empty, the value 1 isn't computed:

user=> (-> (test-stack) (single-pop) (push-seq [9]) (single-pop)
(single-pop) (empty?))
(first  2 )
(rest after  2 )
false

> I wanted to avoid this extra evaluation, without introducing "delay"
> and "force" everywhere, or writing my own versions of the built-in seq
> functions.

Unless I've completely misunderstood your scenario, I think this
demonstrates that the seq abstraction as it currently exists is
sufficient for providing the kind of laziness you need.

--Chouser

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To post to this group, send email to clojure@googlegroups.com
To unsubscribe from this group, send email to 
clojure+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: Questions & workaround regarding Clojure, true laziness, and nil-terminated sequences

Reply via email to