On Mon, Feb 1, 2010 at 2:42 AM, Moss Prescott <m...@theprescotts.com> wrote:
> Hi,
>
> As a very green Clojure user and a big fan of persistent data
> structures, I'm struggling to grasp the significance of transients in
> Clojure 1.1. In particular, the implementation seems to be less safe
> and less consistent than it could be.
>
> Consider this somewhat silly REPL session:
>
> user=> (def t (transient []))
> #'user/t
> user=> (def t2 (conj! t 1))
> #'user/t2
> user=> (def t3 (conj! t2 2))
> #'user/t3
> user=> (persistent! t)
> [1 2]
>
> This corresponds to an equivalent interaction with a persistent vector
> until the last line, when the original vector is made persistent, and
> is found to contain the elements that were conj!-ed onto the later
> versions! The problem is that each conj! operation mutates the data
> structure, but leaves the old reference to it around:
>
> user=> t
> #<TransientVector clojure.lang.PersistentVector
> $transientvec...@55dec1dd>
> user=> t2
> #<TransientVector clojure.lang.PersistentVector
> $transientvec...@55dec1dd>
> user=> t3
> #<TransientVector clojure.lang.PersistentVector
> $transientvec...@55dec1dd>
>
> (note: all three names refer to the same object)
>
> This means that although a properly "functional" vector-building
> sequence will do the right thing (that is, the same thing it would do
> with a persistent vector), if your code is not "correct" you get not a
> failure, but a different result than you would otherwise. In the above
> example, the final result is [1,2] instead of []. I think a non-silly
> example could probably be constructed, where this kind of thing
> happens for a better reason (say, if you build a vector by visiting
> some other data structure, get to the end and for some reason end up
> returning the version before you added the last element -- which might
> be the easiest way to do it in some cases and works perfectly well if
> you're using regular persistent vectors).
>
> My question is: why not have each conj! produce a new, lightweight
> TransientVector instance pointing to the same shared data? The
> previous instance would be marked as dead, in the same way as the
> single instance is currently made a zombie when persistent! is finally
> called. The result would be that as soon as the program deviated from
> single-path usage of the transient, an exception would immediately be
> thrown. This would signal the programmer that either the code has a
> bug or it simply isn't suited to use transients.
>
> My guess is that the resulting ephemeral garbage would have only a
> small effect on performance, retaining most of the advantage of
> transients, but improving their safety. Would any of the Clojure
> experts care to comment on whether this seems like a worthwhile
> exercise?
>
> I don't think it would be hard for someone more familiar with the
> implementation to prototype this, and I'd be very interested in a
> report of how well it works and any performance impact. If no one
> bites, I'll probably eventually get around to giving it a try, in
> which case I'll definitely post my observations.
>

The point of transients is to compete with the fastest of mutable data
structures, which often do no allocation. Allocating on every
modification will hurt you in that comparison.

OTOH, I recognize the safety issue, however, it falls into the same
category as using a file after closing it - you are using a transient
persistently, that's not going to work. That said, we use wrapper
objects to manage things like resource lifetime for us, and I am
working on the design of a new reference type that wraps transients.

Stay tuned!

Rich

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to