Johann, if you are still following this thread, could you try running
this Clojure program on your 8 core machine?

http://github.com/jafingerhut/clojure-benchmarks/blob/3e45bd8f6c3eba47f982a0f6083493a9f076d0e9/misc/pmap-testing.clj

These first set of parameters below will do 8 jobs sequentially, each
doing 10^10 (inc c)'s, where c is a double primitive.  The second will
do the 8 jobs in parallel, hopefully finishing about 8 times faster on
your machine:

java -cp <your_class_path> clojure.main pmap-testing.clj double2 8
10000000000 1
java -cp <your_class_path> clojure.main pmap-testing.clj double2 8
10000000000 8

If you replace double2 with double1, it should reproduce your initial
test case with (inc 0.1) in the inner loop -- the one that started
this thread about why there wasn't much speedup.


I created some Clojure and Java functions that are as similar as I
know how to make them, but frankly I don't really know whether my JVM
implementation (Apple's java 1.6.0_13) is using 'new Double', or a
cache as mentioned by John Harrop earlier in this discussion, in its
implementation of Double.valueOf(double).  I've found that the
performance is very similar to a Java program that uses 'new Double'
explicitly in its inner loop.

In the results linked below, the 'sequential' runs are for calling the
function named twice in a row, whereas the 'parallel' runs are for
having two parallel threads, on my 2 core machine, each of which calls
the named function once.  The total amount of computation in each case
should be the same, so you'd hope that the parallel case would finish
in about half the time, with about the same amount of total CPU time
being utilized.

That isn't what happens for the Java code with 'new Double' (called
NewDoubleTest), or for the Clojure code that has (inc 0.1) and calls
Double.valueOf(double) down in its implementation (called spin-
double1).  The parallel case only saves about 14% of the elapsed time
with 2 cores, and takes about 50% more total CPU time.

The more typical code that takes a single double value, initializes
it, and then adds 1 to it each time through the inner loop (Java
DoubleTest, Clojure spin-double2), is significantly faster than the
versions mentioned above.  Plus they exhibit the expected speedup from
using 2 cores instead of 1.

http://github.com/jafingerhut/clojure-benchmarks/blob/3e45bd8f6c3eba47f982a0f6083493a9f076d0e9/misc/RESULTS

I'm not sure how to determine why calling 'new Double' each time
through NewDoubleTest's inner loop causes 2 threads to perform not
much better than 1.  The best possible explanation I've heard is from
Nicolas Oury -- perhaps we are measuring the bandwidth from cache to
main memory, not raw computational ability of the processor cores.

Andy
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to