On Thu, May 5, 2011 at 4:41 PM, Alan <a...@malloys.org> wrote:
> Right. But if I drop all references to the returned function after I'm
> done with it, it gets GCed. If there's some class holding a reference
> to it forever, it will never get cleaned up. For example, ((fnmaker4
> (range)) 1e6) will (I think?) currently result in a million ints being
> held in memory at once, as things are now. Those things will get
> thrown away shortly thereafter, though.

True. We only need to worry about those integers being held if we keep
a reference to it somewhere.

Under my externalizability proposal, the fnmaker4 instance would hold
a reference to that instance of (range) in its metadata, but it
already holds such a reference in an instance variable somewhere. If
the fnmaker4 instance becomes unreachable, the GC will collect it and
the (range) instance even with the metadata. Only if it's externalized
is there an issue.

Case 1 is not real "externalization" and is (eval `(some stuff
~the-fnmaker4-instance more stuff)) and suchlike. In that case there's
no real externalization required; eval can just embed a reference to
the fnmaker4 instance directly into the generated class. As long as
things are configured such that whole classes can be unloaded if no
longer in use, if the return value from eval is discarded and gets
eligible for GC, the (range) instance can still become GCable.

Case 2 is actual conversion of the closure into code that can recreate
it. The easy way to handle it is to disallow lazy seqs, or walk them
to some point and reject if over some length/byte limit, but that's
kind of icky.

Alternatively, we can have lazy seq externalization be done by
externalizing the unrealized seq -- that is, the fact of a lazy seq
with some particular generator function. That gives us another
function to externalize. Let's consider the output of

(map #(* x x) (range))

-- it turns out that this is a LazySeq object built ultimately around
delay and force. Somewhere in there is a fn that is closed over
(range) and #(* x x) and generates each successive element of the lazy
sequence. Externalizing that requires externalizing (range) and #(* x
x). The latter is trivial -- it's not even closed over anything. The
former is a specially-implemented lazy sequence instance and that
implementation can just externalize using print-dup as #=(range),
#=(range 3), #=(range 3 7), or the like.

The general pattern will be that lazy sequences amount to a special
kind of closure over, frequently, an input sequence, plus some other
values, often including more functions. The regress stops when it hits
something implemented the way range is, or an explicit use of the
lazy-seq macro, or similar. But the lazy-seq macro also just wraps
calls to a step function that is generally closed over something.

I'm fairly confident that it can, in principle, be done. Of course, it
won't always be doable -- a line-seq for example will not be
externalizable by the default means because eventually somewhere in
its guts is some function that has closed over a file handle.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to