On Mon, Feb 1, 2010 at 2:42 AM, Moss Prescott <m...@theprescotts.com> wrote: > Hi, > > As a very green Clojure user and a big fan of persistent data > structures, I'm struggling to grasp the significance of transients in > Clojure 1.1. In particular, the implementation seems to be less safe > and less consistent than it could be. > > Consider this somewhat silly REPL session: > > user=> (def t (transient [])) > #'user/t > user=> (def t2 (conj! t 1)) > #'user/t2 > user=> (def t3 (conj! t2 2)) > #'user/t3 > user=> (persistent! t) > [1 2] > > This corresponds to an equivalent interaction with a persistent vector > until the last line, when the original vector is made persistent, and > is found to contain the elements that were conj!-ed onto the later > versions! The problem is that each conj! operation mutates the data > structure, but leaves the old reference to it around: > > user=> t > #<TransientVector clojure.lang.PersistentVector > $transientvec...@55dec1dd> > user=> t2 > #<TransientVector clojure.lang.PersistentVector > $transientvec...@55dec1dd> > user=> t3 > #<TransientVector clojure.lang.PersistentVector > $transientvec...@55dec1dd> > > (note: all three names refer to the same object) > > This means that although a properly "functional" vector-building > sequence will do the right thing (that is, the same thing it would do > with a persistent vector), if your code is not "correct" you get not a > failure, but a different result than you would otherwise. In the above > example, the final result is [1,2] instead of []. I think a non-silly > example could probably be constructed, where this kind of thing > happens for a better reason (say, if you build a vector by visiting > some other data structure, get to the end and for some reason end up > returning the version before you added the last element -- which might > be the easiest way to do it in some cases and works perfectly well if > you're using regular persistent vectors). > > My question is: why not have each conj! produce a new, lightweight > TransientVector instance pointing to the same shared data? The > previous instance would be marked as dead, in the same way as the > single instance is currently made a zombie when persistent! is finally > called. The result would be that as soon as the program deviated from > single-path usage of the transient, an exception would immediately be > thrown. This would signal the programmer that either the code has a > bug or it simply isn't suited to use transients. > > My guess is that the resulting ephemeral garbage would have only a > small effect on performance, retaining most of the advantage of > transients, but improving their safety. Would any of the Clojure > experts care to comment on whether this seems like a worthwhile > exercise? > > I don't think it would be hard for someone more familiar with the > implementation to prototype this, and I'd be very interested in a > report of how well it works and any performance impact. If no one > bites, I'll probably eventually get around to giving it a try, in > which case I'll definitely post my observations. >
The point of transients is to compete with the fastest of mutable data structures, which often do no allocation. Allocating on every modification will hurt you in that comparison. OTOH, I recognize the safety issue, however, it falls into the same category as using a file after closing it - you are using a transient persistently, that's not going to work. That said, we use wrapper objects to manage things like resource lifetime for us, and I am working on the design of a new reference type that wraps transients. Stay tuned! Rich -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en