Re: pmap: transducer & chunking

Timothy Dean Fri, 26 May 2017 09:29:50 -0700

You can also get an indication this way, I think:

user=> (defn f [x] (Thread/sleep 100) x)
#'user/f
user=> (time (doall (pmap f (range 32))))
"Elapsed time: 128.383719 msecs"
user=> (time (doall (pmap f (range 33))))
"Elapsed time: 201.462131 msecs"
user=> (time (doall (pmap f (range 64))))
"Elapsed time: 201.89915 msecs"
user=> (time (doall (pmap f (range 65))))
"Elapsed time: 301.711263 msecs"
user=> (time (doall (pmap f (range 96))))
"Elapsed time: 302.205999 msecs"
user=> (time (doall (pmap f (range 97))))
"Elapsed time: 402.236708 msecs"


Regarding motivation for having a pmap transducer, my own is mainly that it 
composes with other transducers. A major practical/implementation "triumph" 
of transducers may be the reduced garbage, but there is a benefit of 
simplified expression as well, since transducers encode transformation in a 
manner agnostic to many elements of the context of the transformation. In a 
project containing many transducers variously composed and consumed 
throughout, it may be that I wish to parallelize one portion of a 
transformation occurring in the middle of a composition of transducers. If 
so, I can't simply drop in clojure.core's pmap.

Instead I now must -- in the specific context of whatever higher-order 
transducing function I happen to be using -- break the composed transducers 
apart, generate a transitory sequence, pmap over that, and then proceed 
with transforming the temporary sequence. That is, in this case I'm not 
concerned so much with the resource overhead of swapping a transducer for a 
non-transducer, but the expressive overhead. Composition is elegant. My 
solutions with clojure.core's pmap are clunky. I'm not ruling out PEBCAK 
error, but a pmap transducer seems to me the "obvious" solution preserving 
compositional elegance.

For instance, a very simple example using transduce:

(transduce (comp (remove bad)                                               
    
                 ...                                                       
     
                 (map transform)                                           
     
                 ...                                                       
     
                 (keep pred))                                               
    
           reducer                                                         
     
           init coll)                                                       
    

must become something like:

(->> coll                                                                   
    
     (sequence (comp (remove bad)                                           
    
                     ...))                                                 
     
     (pmap transform)                                                       
    
     (transduce (comp ...                                                   
    
                      (keep pred))                                         
     
                reducer                                                     
    
                init))

But it could be as simple as:

(transduce (comp (remove bad)                                               
    
                 ...                                                       
     
                 (pmap transform)                                           
     
                 ...                                                       
     
                 (keep pred))                                               
    
           reducer                                                         
     
           init coll)

Does that make sense, or am I crazy? :)

~Timothy Dean

On Friday, May 26, 2017 at 3:10:15 AM UTC-6, Matching Socks wrote:
>
> With the overhead of threading, isn't the triumph of a transducer (no
> seq allocations) rather subtle in the case of pmap?
>
>
> At any rate!, as a point of interest, since you mentioned a quirk of
> pmap's thread usage: It apparently has to do with whether the input 
> sequence is a
> chunked sequence or not.
>
> Comparing two possible inputs
>
> user> (def c (into [] (range 50))) ;; chunked
>
> user> (def n (into '() (range 50))) ;; not chunked
>
> With pmap invoking a function that just returns the thread's name (and
> takes some time, so as to keep its assigned thread busy enough not to
> accept more tasks right away)
>
> user> (defn f [x] (Thread/sleep 22) (.getName (Thread/currentThread)))
> #'user/f
> user> (count (distinct (pmap f c)))
> 32
> user> (count (distinct (pmap f n)))
> 7
>
> The test is not perfect because it does not show that all 32
> threads were used *at once*. There were simply 32 distinct threads used at 
> one
> point or another.  Nonetheless the difference between 7 and 32 is
> suggestive.
>
> In this environment there are 4 "processors", so 32 is indeed more than 
> you might want.
>
> The docstring about pmap could be clearer about this.
>
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: pmap: transducer & chunking

Reply via email to