On Sat, Mar 20, 2010 at 5:43 PM, Michał Marczyk <michal.marc...@gmail.com> wrote: > Well, you're calling subvec about twice as many times (I guess) as > there are elements in your input vector. Then you're calling count at > least once for each of the intermediate vectors (and twice for those > which will be split further). Even with O(1) operations like those, > that's certainly going to slow you down significantly if you don't > derive a big concurrency-related benefit from it.
I've checked that although the overhead of "count" is negligible, "subvec" causes a significant slowdown, probably due to the memory allocation. As the subvec doesn't really do any destructuring of the data collection (it acts as a simple proxy to the subsequent data access operations) we can do the same without it (and faster): (defn- sum_tree_int2 [val vec start cnt] (if (= cnt 1) (+ val (nth vec start)) (let [cnt_half (quot cnt 2) s2 (+ start cnt_half)] (recur (sum_tree_int2 val vec start cnt_half) vec s2 (- cnt cnt_half))))) (defn sum_tree2 [vec] (sum_tree_int2 0 vec 0 (count vec))) user> (dotimes [_ 5] (time (sum_seq (vec (range 100000))))) "Elapsed time: 58.992694 msecs" "Elapsed time: 60.766941 msecs" "Elapsed time: 61.650573 msecs" "Elapsed time: 63.162777 msecs" "Elapsed time: 64.05898 msecs" nil user> (dotimes [_ 5] (time (sum_tree (vec (range 100000))))) "Elapsed time: 1469.829266 msecs" "Elapsed time: 1472.129838 msecs" "Elapsed time: 1472.557825 msecs" "Elapsed time: 1474.831577 msecs" "Elapsed time: 1486.996811 msecs" nil user> (dotimes [_ 5] (time (sum_tree2 (vec (range 100000))))) "Elapsed time: 90.365117 msecs" "Elapsed time: 82.10876 msecs" "Elapsed time: 81.573217 msecs" "Elapsed time: 83.333776 msecs" "Elapsed time: 84.892913 msecs" nil I'm happy with these results so there is there is no need to dwell on it any longer. > An example implementation: > > (defn chunked-commutative-vector-reduce [n f v] > (let [cnt (count v) > split-points (concat (range 0 cnt (/ cnt n)) [cnt]) > subvs (map #(subvec v %1 %2) split-points (rest split-points))] > (reduce f (pmap #(reduce f %) subvs)))) Thanks, that's what I was going to do next. You're certainly right about using fewer, more coarse-grained partitions. BTW, you can check the number of available processors using (.. Runtime getRuntime availableProcessors) (that's how pmap is doing it) but ultimately it might be better to let the user or a system administrator decide about it. > Oh, one more thing. The presentation by Guy Steele at ICFP 2009, > "Organizing Functional Code for Parallel Execution; or, foldl and > foldr Considered Slightly Harmful", is interesting in this context; > links to the presentation video as well as the slides deck are > available here: > > http://lambda-the-ultimate.org/node/3616 I was just looking for this presentation! Thank you. Cheers, Andrzej -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en To unsubscribe from this group, send email to clojure+unsubscribegooglegroups.com or reply to this email with the words "REMOVE ME" as the subject.