Re: Clojure blog post about laziness

Rich Hickey Fri, 09 Jan 2009 05:51:30 -0800

On Jan 8, 11:22 pm, "Mark Engelberg" <mark.engelb...@gmail.com> wrote:
> On Thu, Jan 8, 2009 at 5:55 AM, Rich Hickey <richhic...@gmail.com> wrote:
> > When you
> > ask for the nth element/rest, you get the identical item every time.
>
> I know this is nitpicking, but if this is really the promise that the
> seq abstraction is supposed to fulfill, then Clojure already violates
> this:
> (def a (seq [1 2 3]))
> (identical? (rest  a) (rest a)) yields false
>
> Presumably what you mean is that the nth element should be identical
> (but not necessarily the nth rest).
>

No, I meant what I said, but it is true that ArraySeq and Range
'cheat' (for perf), and are thus broken in the identity sense.

> > The sequence abstraction in Clojure is the list abstraction of Lisp.
> > No more, no less.
>
> Well, in Lisp, lists are concrete, finite things that are mutable (and
> therefore don't provide any kind of guarantee that you'll get the same
> element with multiple traversals).  So to say that Clojure's sequence
> abstraction is no more and no less than the list abstraction of Lisp
> is a bit misleading.

True, they are both less and more, in being immutable.

> You have a certain concept of what properties of
> this abstraction you want to model, and what promises you want to
> keep.  And that's great -- honestly, I think it's fantastic that you
> have such a clear, coherent design vision that you've maintained
> throughout the creation of Clojure.  I'm just saying that your
> interpretation of what constitutes an abstraction of a Lisp list is
> more subjective than your statement implies.
>

Yes. Please don't read anything I say as being 'the only way to
conceive of things is this/my way'.

> I'm pretty sure my concept of the list abstraction differs from yours.
>  In my mind the essential abstract concept of a list is anything that
> can be explained in this way:
> A list has a first and a rest, where first is some kind of element,
> and rest is another list or nil.
>

Actually I understand both concepts. The point I was trying to make is
that there is an abstraction in Clojure and it is not that one, and
swapping out caching breaks it.

I had exactly the same idea, evidenced by the fact that I had
implemented it and tested it. So, I don't think it is a bad idea. But
it does break from Clojure's current promises. What I think is
important is that any proposed substitution be clear about what its
promises are, and what tradeoffs are involved.

> In other words, I would say a sequence abstraction should apply to
> anything that has this sort of nested, self-referential recursive
> structure.
>

It could, and that's what lazy-seq implements.

> There is an elegance to the way that algorithms on self-referential
> data structures can be expressed via recursion.  To me, a big part of
> Clojure's beauty lies in the recognition that so many things can be
> converted to this view.  Strictly speaking, a vector is not structured
> in this way, but you can convert it into something that has this
> property, allowing you to program vectors with this sort of recursive
> elegance.
>
> When I think about something like the sequence of whole numbers, it
> seems like a perfect embodiment of what I consider to be the essential
> abstraction of a sequence.  You've got a first thing, and the rest
> thing is another sequence that has similarity to the original.
> Because it has a recursive structure, I want to be able to operate on
> it using first and rest.  To me, a sequence feels like the "right
> abstraction" for the whole numbers, and writing algorithms that
> operate on them.  And this feels like the right abstraction to me
> regardless of implementation details such as whether the traversed
> numbers should reside in memory all at once.
>

This is really a quite different problem, having to do with the
'laziness' of the collection itself. If (defn whole-numbers []
(iterate inc 0)) is too unbearable, you could attain this right now
with something like this:

(defn fn-coll [f]
  (proxy [clojure.lang.IPersistentCollection clojure.lang.Sequential]
[]
    (seq [] (seq (f)))
    (count [] (count (seq (f))))
    (cons [x] (cons x (seq (f))))
    (empty [] nil)))

(def whole-numbers (fn-coll #(iterate inc 0)))

Where fn-coll defines an on-demand collection created by a generating
function.

> > Making seqs 'non-caching' makes no sense because they are no longer
> > the same abstraction.
>
> I get that your design vision revolves around the idea that sequences
> promise identity, and that there are real technical challenges
> surrounding the idea of non-cached sequences.  But I don't consider
> "non-caching sequences" to be an oxymoron.

I don't either - I'm sorry if that seemed to imply that. What is
probably more correct is - non-caching seq functions would not yield
sequences according to the current seq abstraction.

> If it were possible to
> implement them in a way that meshed with the rest of Clojure's design,
> I think there would be real value to being able to manipulate
> non-cached recursive structures with the existing sequence interface.
> It's clear you don't think this fits with Clojure's design, but I
> haven't yet given up hope of finding a clever way to do this, and
> trying to convince you that it is worthwhile :) .
>

Well, you can manipulate non-cached seqs through the interface, as
you've shown ArraySeq is non-caching. The real question is - can the
seq library be made non-caching?

> I know that over email, it is all too common for the tone of a post to
> be misconstrued.  So just in case, I want to emphasize here that these
> posts are in no way intended as bashing Clojure or Rich's design
> sensibilities.  It is precisely because I admire the design, and
> already care about Clojure, that I raise these points in the hope that
> community discussion can elevate the language even further.  I strive
> to make my comments as constructive in tone as possible, and I hope
> that they are perceived in that light.

Same here, you've been quite reasonable.

That said, I think it is critical when proposing a change to think
through all use cases, not just your own, and all tradeoffs. To be
explicit about what the new promises will be, i.e. what is the new
abstraction, and to double check that it will provide the benefits
presumed.

Since I had already done this, with lazy-seq/cached-seq, let me
further describe my thoughts.

The idea is much as you've requested - reduce the promise of seqs to
be merely functionally generative, not necessarily persistent. There
is no way for them to guarantee referential integrity, as their source
might not, or, if that was to be the abstraction, enforcing it becomes
a user problem. In short, the sequence functions won't cache - they'll
use lazy-seq instead of lazy-cons.

Users creating seqs from ephemeral or expensive to calculate sources
will have to cache them explicitly. There are currently many such
usages. Some are obviously I/O based, others more subtle - (map #(new
Foo %) ...), (map (agent/ref x) ...), i.e. any identity/reference
generating function. It ends up being quite a handy and pragmatic
thing that Clojure does, taking all this non-functional stuff and
turning it into persistent data (with referential integrity)
automatically. I'm certain the number of people who care about that
far exceeds those who share your expectations for (def whole-numbers
(iterate inc 0)). People have a tough enough time understanding
laziness - e.g. why (map print ...) doesn't do what they expect. Now
they will need to know when they have to cache in order to produce a
reliable sequence.

There are tradeoffs even for functional programmers, and very subtle
ones. You see, LazySeq needs to keep its generating function (usually
a closure), in order to generate new first/rest on-demand. For many
recursive algorithms, that closure will close over the prior head when
creating rest. This results in retaining the entire list anyway during
traversal (i.e. even when not retaining head!) due to this closure
back-chain. LazyCons avoids this by nulling out its generating
function once used, and can do this strictly because it caches.

So, many tradeoffs. What about the benefits? Could the core library
even promise that no seq fn will cache? If so, you could then build
named top-level seq values like whole-numbers that you know won't
retain the seq. But I don't think the library in general can make that
guarantee, due to the back chain problems. So you'd have to qualify
each fn as caching or not.

And the seq abstraction won't guarantee that no seq will ever be
cached - many need be and concrete lists are inherently so. So it
doesn't solve the 'take care not to retain the head' problem for
general purpose functions like filter, which must work correctly for
cached/concrete seqs. There is already aggressive clear-locals-on-
tails calls that remove the vast majority of cases of local head
retention.

Any other benefits? Well, (def whole-numbers (iterate inc 0)) will
work the way you want :) There are slight speed benefits, but the
comparison is not between head-retaining and not, but between non-head-
retaining caching and not, and my tests showed the caching overhead to
be about 10%.

So, in the end I saw few benefits and many tradeoffs, and still do.

Rich

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To post to this group, send email to clojure@googlegroups.com
To unsubscribe from this group, send email to 
clojure+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---
Re: Clojure blog post about laziness

Reply via email to