Transients question/feedback

Moss Prescott Mon, 01 Feb 2010 01:39:09 -0800

Hi,

As a very green Clojure user and a big fan of persistent data
structures, I'm struggling to grasp the significance of transients in
Clojure 1.1. In particular, the implementation seems to be less safe
and less consistent than it could be.


Consider this somewhat silly REPL session:

user=> (def t (transient []))
#'user/t
user=> (def t2 (conj! t 1))
#'user/t2
user=> (def t3 (conj! t2 2))
#'user/t3
user=> (persistent! t)
[1 2]

This corresponds to an equivalent interaction with a persistent vector
until the last line, when the original vector is made persistent, and
is found to contain the elements that were conj!-ed onto the later
versions! The problem is that each conj! operation mutates the data
structure, but leaves the old reference to it around:

user=> t
#<TransientVector clojure.lang.PersistentVector
$transientvec...@55dec1dd>
user=> t2
#<TransientVector clojure.lang.PersistentVector
$transientvec...@55dec1dd>
user=> t3
#<TransientVector clojure.lang.PersistentVector
$transientvec...@55dec1dd>

(note: all three names refer to the same object)

This means that although a properly "functional" vector-building
sequence will do the right thing (that is, the same thing it would do
with a persistent vector), if your code is not "correct" you get not a
failure, but a different result than you would otherwise. In the above
example, the final result is [1,2] instead of []. I think a non-silly
example could probably be constructed, where this kind of thing
happens for a better reason (say, if you build a vector by visiting
some other data structure, get to the end and for some reason end up
returning the version before you added the last element -- which might
be the easiest way to do it in some cases and works perfectly well if
you're using regular persistent vectors).

My question is: why not have each conj! produce a new, lightweight
TransientVector instance pointing to the same shared data? The
previous instance would be marked as dead, in the same way as the
single instance is currently made a zombie when persistent! is finally
called. The result would be that as soon as the program deviated from
single-path usage of the transient, an exception would immediately be
thrown. This would signal the programmer that either the code has a
bug or it simply isn't suited to use transients.

My guess is that the resulting ephemeral garbage would have only a
small effect on performance, retaining most of the advantage of
transients, but improving their safety. Would any of the Clojure
experts care to comment on whether this seems like a worthwhile
exercise?

I don't think it would be hard for someone more familiar with the
implementation to prototype this, and I'd be very interested in a
report of how well it works and any performance impact. If no one
bites, I'll probably eventually get around to giving it a try, in
which case I'll definitely post my observations.

- moss

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Transients question/feedback

Reply via email to