Re: Poor parallelization performance across 18 cores (but not 4)

Timothy Baldridge Wed, 18 Nov 2015 08:04:41 -0800

Oh, then I completely mis-understood the problem at hand here. If that's
the case then do the following:


Change "atom" to "volatile!" and "swap!" to "vswap!". See if that changes
anything.

Timothy


On Wed, Nov 18, 2015 at 9:00 AM, David Iba <david...@gmail.com> wrote:

> Timothy:  Each thread (call of f2) creates its own "local" atom, so I
> don't think there should be any swap retries.
>
> Gianluca:  Good idea!  I've only tried OpenJDK, but I will look into
> trying Oracle and report back.
>
> Andy:  jvisualvm was showing pretty much all of the memory allocated in
> the eden space and a little in the first survivor (no major/full GC's), and
> total GC Time was very minimal.
>
> I'm in the middle of running some more tests and will report back when I
> get a chance today or tomorrow.  Thanks for all the feedback on this!
>
> On Thursday, November 19, 2015 at 12:38:55 AM UTC+9, tbc++ wrote:
>>
>> This sort of code is somewhat the worst case situation for atoms (or
>> really for CAS). Clojure's swap! is based off the "compare-and-swap" or CAS
>> operation that most x86 CPUs have as an instruction. If we expand swap! it
>> looks something like this:
>>
>> (loop [old-val @x*]
>>   (let [new-val (assoc old-val :k i)]
>>     (if (compare-and-swap x* old-val new-val)
>>        new-val
>>        (recur @x*)))
>>
>> Compare-and-swap can be defined as "updates the content of the reference
>> to new-val only if the current value of the reference is equal to the
>> old-val).
>>
>> So in essence, only one core can be modifying the contents of an atom at
>> a time, if the atom is modified during the execution of the swap! call,
>> then swap! will continue to re-run your function until it's able to update
>> the atom without it being modified during the function's execution.
>>
>> So let's say you have some super long task that you need to integrate
>> into a ref, he's one way to do it, but probably not the best:
>>
>> (let [a (atom 0)]
>>   (dotimes [x 18]
>>     (future
>>         (swap! a long-operation-on-score some-param))))
>>
>>
>> In this case long-operation-on-score will need to be re-run every time a
>> thread modifies the atom. However if our function only needs the state of
>> the ref to add to it, then we can do something like this instead:
>>
>> (let [a (atom 0)]
>>   (dotimes [x 18]
>>     (future
>>         (let [score (long-operation-on-score some-param)
>>           (swap! a + score)))))
>>
>> Now we only have a simple addition inside the swap! and we will have less
>> contention between the CPUs because they will most likely be spending more
>> time inside 'long-operation-on-score' instead of inside the swap.
>>
>> *TL;DR*: do as little work as possible inside swap! the more you have
>> inside swap! the higher chance you will have of throwing away work due to
>> swap! retries.
>>
>> Timothy
>>
>> On Wed, Nov 18, 2015 at 8:13 AM, gianluca torta <giat...@gmail.com>
>> wrote:
>>
>>> by the way, have you tried both Oracle and Open JDK with the same
>>> results?
>>> Gianluca
>>>
>>> On Tuesday, November 17, 2015 at 8:28:49 PM UTC+1, Andy Fingerhut wrote:
>>>>
>>>> David, you say "Based on jvisualvm monitoring, doesn't seem to be
>>>> GC-related".
>>>>
>>>> What is jvisualvm showing you related to GC and/or memory allocation
>>>> when you tried the 18-core version with 18 threads in the same process?
>>>>
>>>> Even memory allocation could become a point of contention, depending
>>>> upon how the memory allocation works with many threads.  e.g. Depends on
>>>> whether a thread gets a large chunk of memory on a global lock, and then
>>>> locally carves it up into the small pieces it needs for each individual
>>>> Java 'new' allocation, or gets a global lock for every 'new'.  The latter
>>>> would give terrible performance as # cores increase, but I don't know how
>>>> to tell whether that is the case, except by knowing more about how the
>>>> memory allocator is implemented in your JVM.  Maybe digging through OpenJDK
>>>> source code in the right place would tell?
>>>>
>>>> Andy
>>>>
>>>> On Tue, Nov 17, 2015 at 2:00 AM, David Iba <davi...@gmail.com> wrote:
>>>>
>>>>> correction: that "do" should be a "doall".  (My actual test code was a
>>>>> bit different, but each run printed some info when it started so it 
>>>>> doesn't
>>>>> have to do with delayed evaluation of lazy seq's or anything).
>>>>>
>>>>>
>>>>> On Tuesday, November 17, 2015 at 6:49:16 PM UTC+9, David Iba wrote:
>>>>>>
>>>>>> Andy:  Interesting.  Thanks for educating me on the fact that atom
>>>>>> swap's don't use the STM.  Your theory seems plausible... I will try 
>>>>>> those
>>>>>> tests next time I launch the 18-core instance, but yeah, not sure how
>>>>>> illuminating the results will be.
>>>>>>
>>>>>> Niels: along the lines of this (so that each thread prints its time
>>>>>> as well as printing the overall time):
>>>>>>
>>>>>>    1.   (time
>>>>>>    2.    (let [f f1
>>>>>>    3.          n-runs 18
>>>>>>    4.          futs (do (for [i (range n-runs)]
>>>>>>    5.                     (future (time (f)))))]
>>>>>>    6.      (doseq [fut futs]
>>>>>>    7.        @fut)))
>>>>>>
>>>>>>
>>>>>> On Tuesday, November 17, 2015 at 5:33:01 PM UTC+9, Niels van Klaveren
>>>>>> wrote:
>>>>>>>
>>>>>>> Could you also show how you are running these functions in parallel
>>>>>>> and time them ? The way you start the functions can have as much impact 
>>>>>>> as
>>>>>>> the functions themselves.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Niels
>>>>>>>
>>>>>>> On Tuesday, November 17, 2015 at 6:38:39 AM UTC+1, David Iba wrote:
>>>>>>>>
>>>>>>>> I have functions f1 and f2 below, and let's say they run in T1 and
>>>>>>>> T2 amount of time when running a single instance/thread.  The issue I'm
>>>>>>>> facing is that parallelizing f2 across 18 cores takes anywhere from 
>>>>>>>> 2-5X
>>>>>>>> T2, and for more complex funcs takes absurdly long.
>>>>>>>>
>>>>>>>>
>>>>>>>>    1. (defn f1 []
>>>>>>>>    2.   (apply + (range 2e9)))
>>>>>>>>    3.
>>>>>>>>    4. ;; Note: each call to (f2) makes its own x* atom, so the
>>>>>>>>    'swap!' should never retry.
>>>>>>>>    5. (defn f2 []
>>>>>>>>    6.   (let [x* (atom {})]
>>>>>>>>    7.     (loop [i 1e9]
>>>>>>>>    8.       (when-not (zero? i)
>>>>>>>>    9.         (swap! x* assoc :k i)
>>>>>>>>    10.         (recur (dec i))))))
>>>>>>>>
>>>>>>>>
>>>>>>>> Of note:
>>>>>>>> - On a 4-core machine, both f1 and f2 parallelize well (roungly T1
>>>>>>>> and T2 for 4 runs in parallel)
>>>>>>>> - running 18 f1's in parallel on the 18-core machine also
>>>>>>>> parallelizes well.
>>>>>>>> - Disabling hyperthreading doesn't help.
>>>>>>>> - Based on jvisualvm monitoring, doesn't seem to be GC-related
>>>>>>>> - also tried on dedicated 18-core ec2 instance with same issues, so
>>>>>>>> not shared-tenancy-related
>>>>>>>> - if I make a jar that runs a single f2 and launch 18 in parallel,
>>>>>>>> it parallelizes well (so I don't think it's machine/aws-related)
>>>>>>>>
>>>>>>>> Could it be that the 18 f2's in parallel on a single JVM instance
>>>>>>>> is overworking the STM with all the swap's?  Any other theories?
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "Clojure" group.
>>>>> To post to this group, send email to clo...@googlegroups.com
>>>>> Note that posts from new members are moderated - please be patient
>>>>> with your first post.
>>>>> To unsubscribe from this group, send email to
>>>>> clojure+u...@googlegroups.com
>>>>> For more options, visit this group at
>>>>> http://groups.google.com/group/clojure?hl=en
>>>>> ---
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "Clojure" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to clojure+u...@googlegroups.com.
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Clojure" group.
>>> To post to this group, send email to clo...@googlegroups.com
>>> Note that posts from new members are moderated - please be patient with
>>> your first post.
>>> To unsubscribe from this group, send email to
>>> clojure+u...@googlegroups.com
>>> For more options, visit this group at
>>> http://groups.google.com/group/clojure?hl=en
>>> ---
>>> You received this message because you are subscribed to the Google
>>> Groups "Clojure" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to clojure+u...@googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>>
>> --
>> “One of the main causes of the fall of the Roman Empire was that–lacking
>> zero–they had no way to indicate successful termination of their C
>> programs.”
>> (Robert Firth)
>>
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to clojure+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>



-- 
“One of the main causes of the fall of the Roman Empire was that–lacking
zero–they had no way to indicate successful termination of their C
programs.”
(Robert Firth)

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Poor parallelization performance across 18 cores (but not 4)

Reply via email to