Re: How does pmap partition its work?

Ken Wesson Sun, 23 Jan 2011 19:31:36 -0800

On Sun, Jan 23, 2011 at 8:56 PM, Michael Gardner <gardne...@gmail.com> wrote:
> Suppose I have a sequence of tasks I'd like to parallelize using pmap. The 
> amount of CPU time these tasks require varies greatly; in particular, many of 
> them will require virtually no work. Can I rely on pmap to divide the work 
> efficiently even if there is some pattern to the distribution of easy tasks 
> (e.g. clumped at the beginning, or every nth)? Assume it's not possible to 
> identify the easy tasks beforehand.


Here's a little data:
user=> (def printer (agent nil))
#'user/printer
user=> (defn xyz [n] (send printer (fn [_] println
(Thread/currentThread))) (* n n))
#'user/xyz
user=> (xyz 3)
#<Thread Thread[Thread-2,5,main]>
9
user=> (pmap xyz (range 20))
#<Thread Thread[pool-2-thread-25,5,main]>
#<Thread Thread[pool-2-thread-22,5,main]>
#<Thread Thread[pool-2-thread-25,5,main]>
#<Thread Thread[pool-2-thread-22,5,main]>
#<Thread Thread[pool-2-thread-21,5,main]>
#<Thread Thread[pool-2-thread-21,5,main]>
#<Thread Thread[pool-2-thread-22,5,main]>
#<Thread Thread[pool-2-thread-29,5,main]>
#<Thread Thread[pool-2-thread-29,5,main]>
#<Thread Thread[pool-2-thread-22,5,main]>
#<Thread Thread[pool-2-thread-22,5,main]>
#<Thread Thread[pool-2-thread-22,5,main]>
#<Thread Thread[pool-2-thread-22,5,main]>
#<Thread Thread[pool-2-thread-22,5,main]>
#<Thread Thread[pool-2-thread-22,5,main]>
#<Thread Thread[pool-2-thread-22,5,main]>
#<Thread Thread[pool-2-thread-30,5,main]>
#<Thread Thread[pool-2-thread-28,5,main]>
#<Thread Thread[pool-2-thread-26,5,main]>
#<Thread Thread[pool-2-thread-24,5,main]>
(0 1 4 9 16 25 36 49 64 81 100 121 144 169 196 225 256 289 324 361)
user=>

The fiddling with an agent is to prevent individual println outputs
from getting intermingled; instead the lines are perhaps slightly out
of order but at least are not mangled into unreadability.

As you can see, there are at least seven threads called somewhat
randomly; not a simple alternating pattern of 2 as you might naively
expect on a dual-core machine. It seems to prefer thread 22 for some
reason.

I don't know exactly what is going on under the hood but it looks like
it either involves some degree of randomness, or it tries to
load-balance (and something else was sometimes preempting threads -- I
don't know what, when I ran this nothing on my system was doing
anything CPU-intensive with a priority equal to or higher than the
REPL process, but NetBeans and the REPL were both being a bit
sluggish; some sort of Windows weirdness). Either way it's likely that
a mix of easy and hard jobs, even with a pattern, won't screw things
up too much.

That can be further tested:

user=> (defn heavy [f amt] (fn [x] (Thread/sleep amt) (f x)))
#'user/heavy
user=> (defn s [x] (* x x))
#'user/x
user=> (def l (heavy s 1000))
#'user/y
user=> (defn run [fseq n] (time (doall (pmap #(%1 %2) fseq (range n)))) nil)
#'user/run
user=> (run (cycle [s l]) 100)
"Elapsed time: 17141.64872 msecs"
nil
user=> (run (cycle [s s l l]) 100)
"Elapsed time: 17020.59136 msecs"
nil
user=> (run (cycle [s s l s l l]) 100)
"Elapsed time: 17007.0366 msecs"
nil
user=> (defn randomize [s] (let [n (count s)] (repeatedly #(nth s
(rand-int n)))))
#'user/randomize
user=> (run (randomize [s l]) 100)
"Elapsed time: 18015.8562 msecs"
nil

As you can see, the timings are fairly consistent when a 50/50 mix of
short (<1msec) and long (>1000msec) tasks get parallelized using pmap,
with several possible patterns and also with no pattern to their
distribution.

This was again on a dual-core machine; it's interesting that the
numbers are around 17-18s instead of 25s (100 jobs, averaging 500ms
each, split between two cores). The use of more than two threads in
the thread pool is presumably the cause of this; Thread/sleep lets
another of the threads make progress. It didn't collapse down to just
1s so it's also not the case that pmap is spawning as many threads as
will maximize throughput (fifty running the slow jobs plus two running
the fast ones on both cores while the slow ones sleep). Since the jobs
would cumulatively take 50s if run serially, the speedup factor from
pmap here is just under 3.

One more data point:

user=> (run (repeat l) 50)
"Elapsed time: 10172.37124 msecs"
nil

That's fifty long jobs without any short ones. The speedup factor here
is 5 rather than 3. This shows that it will allow five, but no more,
jobs to be running concurrently, and that it doesn't load balance. If
it did, adding the short jobs wouldn't slow it down more than a few
ms; in fact doing so almost doubles the time, so it seems the jobs are
preallocated in some manner and the short jobs take up half the
available threads in the pool. (This is odd given that it seemed
before to be using at least seven threads, not just five.)

Of course, real jobs are likely to be CPU-intensive instead of
yielding like Thread/sleep.

user=> (def foo (atom nil))
#'user/foo
user=> (defn l2 [x] (doseq [a (range 7500000)] (reset! foo (* a x))) (* x x))
#'user/l2

The number 7500000 causes l2 to take about 900ms on my machine.

user=> (run (cycle [s l2]) 100)
"Elapsed time: 37024.37508 msecs"
nil

Oddly, quite a bit less than 50s. Perhaps reset! spends some time
yielding the CPU for some reason?

user=> (run (cycle [s s l2 l2]) 100)
"Elapsed time: 46963.28936 msecs"
nil
user=> (run (cycle [s s l2 s l2 l2]) 100)
"Elapsed time: 36551.62528 msecs"
nil
user=> (run (randomize [s l2]) 100)
"Elapsed time: 51713.50044 msecs"
nil
user=> (run (repeat l2) 50)
"Elapsed time: 43197.77312 msecs"
nil

These numbers suggest that the CPU load will generally be divvied up
equally among your cores, if the jobs don't yield the CPU (sleep,
block on I/O, block on monitors). If they do yield the CPU here and
there, it looks like pmap may perform slightly better, rather than
slightly worse, if the jobs that do so form a pattern.

Long story short: Don't worry about it. :)

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: How does pmap partition its work?

Reply via email to