On Mar 21, 2010, at 8:29 AM, Andrzej wrote:

On Sun, Mar 21, 2010 at 6:37 PM, Jarkko Oranen <chous...@gmail.com> wrote:

Rich has done some work on using the JDK7 ForkJoin Library to
parallelise map and reduce over vectors, since they are already
internally structured as trees. It hasn't been touched in a while, but
as far as I know the code lives in the 'par' branch of the clojure
repository, and works with JDK6 if you install an additional jar.

Thank you. I'll keep an eye on it.


You really do need to look at par, and fork/join as well. In particular, your desire for even partitioning and balancing is obviated by work stealing. If you think about it, you'll see that that would become immediately necessary as soon as you might have non- uniform cost-per-operation at the leaves, i.e. then perfect partitioning would also become non-ideal.

Rich

Yesterday I looked at the implementation of the PersistentVector
class, trying to figure out how to exploit its internal structure to
decompose the vector. I hit several issues though:

1. The trees are not balanced. So in order to split the vector roughly
in half one would have to deal with the root node _and_ nodes at one
level below it.
2. Node structure doesn't have a "cnt" field but PersistentVector
does. When splitting the vector, length of each part would have to be
calculated, which could be costly. Assuming there are no holes in the
tree it might be enough to look at the depth of the tree and the
number of fully occupied slices in the root node.
3. Apparently PersistentVector doesn't allow an empty tail array (for
lenghts > 0). I.e. tail can have a length of 1 to 32. This would have
to be changed (i.e. 0 to 31) if the partitioning is going to work
fast.
4. Should partitioning stop once the vector is destructured down to a
single chunk (32 values)? Destructuring vectors to single values would
be more convenient to use but would only add overhead without any
reduction in the access time.
5. With all this we can get subvectors with very fast access times
_but_ the destructuring process itself can still trigger a lot of
allocation (each subvector needs at least one new node - a root). So
I'm not sure whether this would give any net performance gain, perhaps
not.

Andrzej

--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

To unsubscribe from this group, send email to clojure +unsubscribegooglegroups.com or reply to this email with the words "REMOVE ME" as the subject.

--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

To unsubscribe from this group, send email to clojure+unsubscribegooglegroups.com or 
reply to this email with the words "REMOVE ME" as the subject.

Reply via email to