On Sat, Mar 20, 2010 at 5:43 PM, Michał Marczyk
<michal.marc...@gmail.com> wrote:
> Well, you're calling subvec about twice as many times (I guess) as
> there are elements in your input vector. Then you're calling count at
> least once for each of the intermediate vectors (and twice for those
> which will be split further). Even with O(1) operations like those,
> that's certainly going to slow you down significantly if you don't
> derive a big concurrency-related benefit from it.

I've checked that although the overhead of "count" is negligible,
"subvec" causes a significant slowdown, probably due to the memory
allocation. As the subvec doesn't really do any destructuring of the
data collection (it acts as a simple proxy to the subsequent data
access operations) we can do the same without it (and faster):

(defn- sum_tree_int2 [val vec start cnt]
  (if (= cnt 1)
    (+ val (nth vec start))
    (let [cnt_half (quot cnt 2)
          s2 (+ start cnt_half)]
      (recur (sum_tree_int2 val vec start cnt_half)
             vec
             s2
             (- cnt cnt_half)))))

(defn sum_tree2 [vec]
  (sum_tree_int2 0 vec 0 (count vec)))

user> (dotimes [_ 5] (time (sum_seq (vec (range 100000)))))
"Elapsed time: 58.992694 msecs"
"Elapsed time: 60.766941 msecs"
"Elapsed time: 61.650573 msecs"
"Elapsed time: 63.162777 msecs"
"Elapsed time: 64.05898 msecs"
nil
user> (dotimes [_ 5] (time (sum_tree (vec (range 100000)))))
"Elapsed time: 1469.829266 msecs"
"Elapsed time: 1472.129838 msecs"
"Elapsed time: 1472.557825 msecs"
"Elapsed time: 1474.831577 msecs"
"Elapsed time: 1486.996811 msecs"
nil
user> (dotimes [_ 5] (time (sum_tree2 (vec (range 100000)))))
"Elapsed time: 90.365117 msecs"
"Elapsed time: 82.10876 msecs"
"Elapsed time: 81.573217 msecs"
"Elapsed time: 83.333776 msecs"
"Elapsed time: 84.892913 msecs"
nil

I'm happy with these results so there is there is no need to dwell on
it any longer.

> An example implementation:
>
> (defn chunked-commutative-vector-reduce [n f v]
>  (let [cnt (count v)
>        split-points (concat (range 0 cnt (/ cnt n)) [cnt])
>        subvs (map #(subvec v %1 %2) split-points (rest split-points))]
>    (reduce f (pmap #(reduce f %) subvs))))

Thanks, that's what I was going to do next. You're certainly right
about using fewer, more coarse-grained partitions. BTW, you can check
the number of available processors using (.. Runtime getRuntime
availableProcessors) (that's how pmap is doing it) but ultimately it
might be better to let the user or a system administrator decide about
it.

> Oh, one more thing. The presentation by Guy Steele at ICFP 2009,
> "Organizing Functional Code for Parallel Execution; or, foldl and
> foldr Considered Slightly Harmful", is interesting in this context;
> links to the presentation video as well as the slides deck are
> available here:
>
> http://lambda-the-ultimate.org/node/3616

I was just looking for this presentation! Thank you.

Cheers,
Andrzej

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

To unsubscribe from this group, send email to 
clojure+unsubscribegooglegroups.com or reply to this email with the words 
"REMOVE ME" as the subject.

Reply via email to