I have some new data that suggests there are issues inherent to pmap
and possibly other parallelism with Clojure on older Intel quad+ core
machines.
I added a noop loop to the benchmark. It looks like this:

(defn noops [n]
  (when (> n 0)
    (recur (- n 1))))

Running those in parallel is also no faster on the Xeon 5150 box with
four cores than it is with two. It has been suggested that memory
contention is the problem with this machine. I suspect Clojure's
overhead relative to Java is the reason that parallel Java benchmarks
get more out of the four cores on this machine, but don't quote me on
that.

I had someone run the benchmarks on an 8-core Nehalem Mac Pro. Those
results are quite a bit different from mine. On the true factorial
benchmark, four threads are twice as fast as two. Eight threads are
50% faster than four, but 16 threads are about twice as fast as four.
Intermediate numbers are a bit variable, but it seems like
hyperthreading actually speeds things up quite a bit on this
benchmark. ka's version of fac, which I've renamed spin-mult scales
linearly with the number of physical cores, but slows down with
between 9 and 15 threads. 16 threads is about equal to 8.

I've put the benchmarks up on github: http://github.com/zakwilson/npmap

I'm going to try changing spin-mult to use dotimes and see how that
runs on several machines. Initial results on the Xeon 5150 box suggest
that using dotimes instead of recur solves the problem, and I'll
probably be changing the benchmarks to further explore the issue.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to