Re: abysmal multicore performance, especially on AMD processors

cameron Sun, 30 Dec 2012 18:28:08 -0800

I've posted a patch with some changes here 
(https://gist.github.com/4416803), it includes the record change here  and 
a small change to interpret-instruction, the benchmark runs > 2x the 
default as it did for Marshall.
The patch also modifies the main loop to use a thread pool instead of 
agents and allows you to set the number of threads, this might help 
diagnosing the parallel performance issue.

On the modified benchmark I'm seeing ~4x speedup with the parallel version 
on an 8 core machine and the profiler reports that the parallel version is 
using twice as much cpu time. 

I also had another look at the native calls issue & modified the clojure 
runtime to avoid most to the calls the profiler said were taking 
significantly more time in the parallel version, it did speed things up but 
only by ~6%, not the large margin the profiling results had led me to 
believe were possible, it looks like the profiler overstates these methods 
times. The modified clojure 1.5 is available here 
https://github.com/cdorrat/clojure/commit/dfb5f99eb5d0a45165978e079284bab1f25bd79f

if anyone's interested

YourKit is reporting that a number of clojure.core functions are taking 
longer in the parallel version than the serial and they all seem to be 
methods that have one or more instanceof or instance? calls but given the 
results above I'm not sure how much weight to give this. 

It's seems the elephant is still in the room and responsible to ~50% of the 
cpu time :)

Cameron.

On Saturday, December 22, 2012 10:57:28 AM UTC+11, Marshall 
Bockrath-Vandegrift wrote:
>
> Lee Spector <lspe...@hampshire.edu <javascript:>> writes: 
>
> > FWIW I used records for push-states at one point but did not observe a 
> > speedup and it required much messier code, so I reverted to 
> > struct-maps. But maybe I wasn't doing the right timings. I'm curious 
> > about how you changed to records without the messiness. I'll include 
> > below my sig the way that I had to do it... maybe you can show me what 
> > you did instead. 
>
> I just double-checked, and I definitely see a >2x speedup on Josiah’s 
> benchmark.  That may still be synthetic, of course.  Here’s what I did: 
>
>     (eval `(defrecord ~'PushState [~'trace ~@(map (comp symbol name) 
> push-types)])) 
>     
>     (let [empty-state (map->PushState {})] 
>       (defn make-push-state 
>         "Returns an empty push state." 
>         [] empty-state)) 
>
> > Still, I guess the gorilla in the room, which is eating the multicore 
> > performance, hasn't yet been found. 
>
> No, not yet...  I’ve become obsessed with figuring it out though, so 
> still slogging at it. 
>
> -Marshall 
>
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: abysmal multicore performance, especially on AMD processors

Reply via email to