Re: Spread work onto multiple threads (in pure Clojure)

Tassilo Horn Tue, 11 Oct 2011 04:58:05 -0700

Andy Fingerhut <andy.finger...@gmail.com> writes:

Hi Andy,


> pmap will limit the maximum number of simultaneous threads.  So will
> the medusa library's medusa-pmap.
>
> The difference is that if one job early in the list takes, e.g., 100
> times longer than the next 20, and you have 4 cpus available, pmap
> will start the first (4+2)=6 in parallel threads, let jobs 2 through 6
> complete, and then wait for the first one to finish before starting
> number 7.  Thus most of the time will be spent running one thread.

Wow, that would render pmap pretty useless, IMO.  I've just checked the
code, but I cannot grasp it completely.  Could you please explain a bit,
or tell me where I'm wrong below?

(I think, I found the answer myself, so feel free to jump forwart do the
"AAAHHHH!!!".)

Here's the definition:

--8<---------------cut here---------------start------------->8---
(defn pmap
  ([f coll]
   (let [n (+ 2 (.. Runtime getRuntime availableProcessors))
         rets (map #(future (f %)) coll)                  ;; (1)
         step (fn step [[x & xs :as vs] fs]               ;; (2)
                (lazy-seq
                 (if-let [s (seq fs)]
                   (cons (deref x) (step xs (rest s)))    ;; (3)
                   (map deref vs))))]
     (step rets (drop n rets))))
  ;; Overloaded variant stripped
  )
--8<---------------cut here---------------end--------------->8---

Because rets is a lazy seq, nothing in it is actually realized in (1),
so that means that no future is created and thus no new thread is
started there.

The `step' function (2) always dereferences the first element of the
first seq (3), thus here is the place where the future is actually
created by submitting to some thread pool.  But dereferencing also
blocks until the result is calculated (i.e., calls future.get()).  So it
looks to me as if there's nothing parallel at all.

Of course, I must be wrong!  But where???

Hm, well, the destructuring in the arglist of `step' probably realizes
`x' (and maybe create a future for the first element in `xs').  So then
we'd have at most two active threads, but still we wait before starting
the next one...

AAAHHHH!!!  I think, I got it!  It's the `drop', isn't it?  To drop n
elements, you have to call `rest' n times, and because that calls `seq',
actually the first n elements are realized, that is, the tasks are
submitted to the thread pool.

So indeed, one starts with the number of available processors + 2, and
one single longer running task will wipe out any parallelism. :-(

IMO, that's not what 99% of the users would expect nor want when calling
(pmap foo coll).  I'd vote for making pmap eager in the sense that it
should always try to keep n threads working as long as more tasks are
available.  Clearly, the current implementation is useful when your
tasks are equally expensive, i.e., the costs don't depend on the
argument.

Bye,
Tassilo

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: Spread work onto multiple threads (in pure Clojure)

Reply via email to