Re: Poor parallelization performance across 18 cores (but not 4)

David Iba Tue, 17 Nov 2015 01:50:04 -0800

Andy:  Interesting.  Thanks for educating me on the fact that atom swap's 
don't use the STM.  Your theory seems plausible... I will try those tests 
next time I launch the 18-core instance, but yeah, not sure how 
illuminating the results will be.


Niels: along the lines of this (so that each thread prints its time as well 
as printing the overall time):

   1.   (time
   2.    (let [f f1
   3.          n-runs 18
   4.          futs (do (for [i (range n-runs)]
   5.                     (future (time (f)))))]
   6.      (doseq [fut futs]
   7.        @fut)))
   

On Tuesday, November 17, 2015 at 5:33:01 PM UTC+9, Niels van Klaveren wrote:
>
> Could you also show how you are running these functions in parallel and 
> time them ? The way you start the functions can have as much impact as the 
> functions themselves.
>
> Regards,
> Niels
>
> On Tuesday, November 17, 2015 at 6:38:39 AM UTC+1, David Iba wrote:
>>
>> I have functions f1 and f2 below, and let's say they run in T1 and T2 
>> amount of time when running a single instance/thread.  The issue I'm facing 
>> is that parallelizing f2 across 18 cores takes anywhere from 2-5X T2, and 
>> for more complex funcs takes absurdly long.
>>
>>
>>    1. (defn f1 []
>>    2.   (apply + (range 2e9)))
>>    3.  
>>    4. ;; Note: each call to (f2) makes its own x* atom, so the 'swap!' 
>>    should never retry.
>>    5. (defn f2 []
>>    6.   (let [x* (atom {})]
>>    7.     (loop [i 1e9]
>>    8.       (when-not (zero? i)
>>    9.         (swap! x* assoc :k i)
>>    10.         (recur (dec i))))))
>>    
>>
>> Of note:
>> - On a 4-core machine, both f1 and f2 parallelize well (roungly T1 and T2 
>> for 4 runs in parallel)
>> - running 18 f1's in parallel on the 18-core machine also parallelizes 
>> well.
>> - Disabling hyperthreading doesn't help.
>> - Based on jvisualvm monitoring, doesn't seem to be GC-related
>> - also tried on dedicated 18-core ec2 instance with same issues, so not 
>> shared-tenancy-related
>> - if I make a jar that runs a single f2 and launch 18 in parallel, it 
>> parallelizes well (so I don't think it's machine/aws-related)
>>
>> Could it be that the 18 f2's in parallel on a single JVM instance is 
>> overworking the STM with all the swap's?  Any other theories?
>>
>> Thanks!
>>
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Poor parallelization performance across 18 cores (but not 4)

Reply via email to