Johann, if you are still following this thread, could you try running this Clojure program on your 8 core machine?
http://github.com/jafingerhut/clojure-benchmarks/blob/3e45bd8f6c3eba47f982a0f6083493a9f076d0e9/misc/pmap-testing.clj These first set of parameters below will do 8 jobs sequentially, each doing 10^10 (inc c)'s, where c is a double primitive. The second will do the 8 jobs in parallel, hopefully finishing about 8 times faster on your machine: java -cp <your_class_path> clojure.main pmap-testing.clj double2 8 10000000000 1 java -cp <your_class_path> clojure.main pmap-testing.clj double2 8 10000000000 8 If you replace double2 with double1, it should reproduce your initial test case with (inc 0.1) in the inner loop -- the one that started this thread about why there wasn't much speedup. I created some Clojure and Java functions that are as similar as I know how to make them, but frankly I don't really know whether my JVM implementation (Apple's java 1.6.0_13) is using 'new Double', or a cache as mentioned by John Harrop earlier in this discussion, in its implementation of Double.valueOf(double). I've found that the performance is very similar to a Java program that uses 'new Double' explicitly in its inner loop. In the results linked below, the 'sequential' runs are for calling the function named twice in a row, whereas the 'parallel' runs are for having two parallel threads, on my 2 core machine, each of which calls the named function once. The total amount of computation in each case should be the same, so you'd hope that the parallel case would finish in about half the time, with about the same amount of total CPU time being utilized. That isn't what happens for the Java code with 'new Double' (called NewDoubleTest), or for the Clojure code that has (inc 0.1) and calls Double.valueOf(double) down in its implementation (called spin- double1). The parallel case only saves about 14% of the elapsed time with 2 cores, and takes about 50% more total CPU time. The more typical code that takes a single double value, initializes it, and then adds 1 to it each time through the inner loop (Java DoubleTest, Clojure spin-double2), is significantly faster than the versions mentioned above. Plus they exhibit the expected speedup from using 2 cores instead of 1. http://github.com/jafingerhut/clojure-benchmarks/blob/3e45bd8f6c3eba47f982a0f6083493a9f076d0e9/misc/RESULTS I'm not sure how to determine why calling 'new Double' each time through NewDoubleTest's inner loop causes 2 threads to perform not much better than 1. The best possible explanation I've heard is from Nicolas Oury -- perhaps we are measuring the bandwidth from cache to main memory, not raw computational ability of the processor cores. Andy --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~----------~----~----~----~------~----~------~--~---