Re: Parallelising over a lazy sequence - request for help

Alex Miller Sat, 28 Sep 2013 13:48:51 -0700

For your timings, I would also strongly recommend altering your project.clj 
to force the -server hotspot:


  :jvm-opts ^:replace ["-Xmx1g" "-server"  ... and whatever else you want 
here ... ]

By default lein will use tiered compilation to optimize repl startup, which 
is not what you want for timing.

And I second Andy's (shutdown-agents) advice. 


On Saturday, September 28, 2013 3:41:20 AM UTC-5, Paul Butcher wrote:
>
> On 28 Sep 2013, at 00:27, Stuart Halloway <[email protected]<javascript:>> 
> wrote:
>
> I have posted an example that shows partition-then-fold at 
> https://github.com/stuarthalloway/exploring-clojure/blob/master/examples/exploring/reducing_apple_pie.clj
> .
>
> I would be curious to know how this approach performs with your data. 
>  With the generated data I used, the partition+fold and partition+pmap 
> approaches both used most of my cores and had similar perf.
>
>
> Hey Stuart, 
>
> Thanks for getting back to me.
>
> I've updated the code for my word count example on GitHub with (I believe) 
> something that works along the lines you suggest:
>
> https://github.com/paulbutcher/parallel-word-count
>
> Here are some sample runs on my machine (a 4-core retina MacBook Pro). 
> Each of these runs counts the words in the first 10000 pages of Wikipedia:
>
> $ lein run 10000 ~/enwiki.xml sequential
> "Elapsed time: 23630.443 msecs"
> $ lein run 10000 ~/enwiki.xml fold
> "Elapsed time: 8697.79 msecs"
> $ lein run 10000 ~/enwiki.xml pmap 10000
> "Elapsed time: 27393.703 msecs"
> $ lein run 10000 ~/enwiki.xml pthenf 10000
> "Elapsed time: 37263.193 msecs"
>
>
> As you can see, the the foldable-seq version gives an almost 3x speedup 
> relative to the sequential version, and both the partition-then-pmap and 
> partition-then-fold versions are significantly slower.
>
> The last argument for the partition-then-pmap and partition-then-fold 
> versions is a partition size. I've tried various different sizes with no 
> material effect:
>
> $ lein run 10000 ~/enwiki.xml pthenf 1000
> "Elapsed time: 43187.385 msecs"
> $ lein run 10000 ~/enwiki.xml pthenf 100000
> "Elapsed time: 35702.651 msecs"
> $ lein run 10000 ~/enwiki.xml pmap 1000
> "Elapsed time: 34087.314 msecs"
> $ lein run 10000 ~/enwiki.xml pmap 100000
> "Elapsed time: 47340.969 msecs"
>
>
> The performance of the partition-then-pmap version is actually much worse 
> than the numbers above suggest. There's something very weird going on with 
> (I guess) garbage collection - it takes a *long* time to finish after 
> printing the elapsed time and the performance is completely pathological 
> with larger page counts.
>
> Bottom line: the only way I've been able to obtain any kind of speedup 
> remains foldable-seq.
>
> I'd be very grateful indeed if you could take a look at how I've 
> implemented partition-then-fold to make sure that I've correctly captured 
> your intent. Or if you have any suggestions for anything else that might 
> work, or to explain the poor performance of partition-then-pmap and 
> partition-then-fold.
>
> My guess is that the problem with partition-then-fold is the copying 
> that's going on during the (into [] %). I can see that it is operating in 
> parallel because the number of cores in use goes up, but the net effect is 
> an overall slowdown rather than a speedup.
>
> That it performs worse than foldable-seq isn't surprising to me, given 
> that it introduces an unnecessary copy.
>
> I still think that it's a crying shame to disallow folding over sequences 
> - as the above shows, the gains both in performance and programming ease 
> are significant, and it would only take a small modification to the 
> reducers API to fix the holding-onto-head problem. What would be the 
> downside of making this modification and allowing foldable sequences?
>
> --
> paul.butcher->msgCount++
>
> Snetterton, Castle Combe, Cadwell Park...
> Who says I have a one track mind?
>
> http://www.paulbutcher.com/
> LinkedIn: http://www.linkedin.com/in/paulbutcher
> MSN: [email protected] <javascript:>
> AIM: paulrabutcher
> Skype: paulrabutcher
>  
> On 28 Sep 2013, at 00:27, Stuart Halloway <[email protected]<javascript:>> 
> wrote:
>
> Hi Paul,
>
> I have posted an example that shows partition-then-fold at 
> https://github.com/stuarthalloway/exploring-clojure/blob/master/examples/exploring/reducing_apple_pie.clj
> .
>
> I would be curious to know how this approach performs with your data. 
>  With the generated data I used, the partition+fold and partition+pmap 
> approaches both used most of my cores and had similar perf.
>
> Enjoying your book!
>
> Stu
>
>
> On Sat, May 25, 2013 at 12:34 PM, Paul Butcher 
> <[email protected]<javascript:>
> > wrote:
>
>> I'm currently working on a book on concurrent/parallel development for 
>> The Pragmatic Programmers. One of the subjects I'm covering is parallel 
>> programming in Clojure, but I've hit a roadblock with one of the examples. 
>> I'm hoping that I can get some help to work through it here.
>>
>> The example counts the words contained within a Wikipedia dump. It should 
>> respond well to parallelisation (I have Java and Scala versions that 
>> perform excellently) but I've been incapable of coming up with a nice 
>> solution in Clojure.
>>
>> The code I'm working with is checked into 
>> GitHub<https://github.com/paulbutcher/parallel-word-count>
>> : 
>>
>> The basic sequential algorithm is:
>>
>> (frequencies (mapcat get-words pages))
>>
>>
>> If I run that on the first 10k pages in Wikipedia dump, it takes ~21s on 
>> my MacBook Pro.
>>
>> One way to parallelise it is to create a parallel version of frequencies 
>> that uses reducers:
>>
>> (defn frequencies-fold [words]
>>   (r/fold (partial merge-with +)
>>           (fn [counts word] (assoc counts word (inc (get counts word 0
>> ))))
>>
>>
>>           words))
>>
>>
>> And sure enough, if I use that, along with use the 
>> foldable-seq<https://github.com/paulbutcher/parallel-word-count/blob/master/src/foldable_seq/core.clj>
>>  utility 
>> I posted about 
>> here<https://groups.google.com/d/msg/clojure/8RKCjF00ukQ/b5mmmOB5Uh4J> are 
>> while ago it runs in ~8s, almost a 3x speedup, not bad given that the 
>> parallel version is unable to use transients.
>>
>> Unfortunately, as they currently stand, reducers have a fatal flaw that 
>> means that, even with foldable-seq, they're basically useless with lazy 
>> sequences. Reducers always hold onto the 
>> head<https://groups.google.com/d/msg/clojure-dev/qJo7z_9CVdw/ARnHe1bThuMJ> 
>> of 
>> the sequence they're given, so there's no way to use this approach for a 
>> complete Wikipedia dump (which runs to around 40GiB).
>>
>> So the other approach I've tried is to use pmap:
>>
>> (defn frequencies-pmap [words]
>>   (reduce (partial merge-with +) 
>>     (pmap frequencies 
>>       (partition-all 10000 words))))
>>
>>
>> But, for reasons I don't understand, this performs dreadfully - taking 
>> ~26s, i.e. significantly slower than the sequential version.
>>
>> I've tried playing with different partition sizes without materially 
>> affecting the result.
>>
>> So, what I'm looking for is either:
>>
>> a) An explanation for why the pmap-based solution performs so badly
>>
>> b) A way to fix the "holding onto head" problem that's inherent within 
>> reducers.
>>
>> With the last of these in mind, it strikes me that the problem 
>> fundamentally arises from the desire for reducers to follow the same basic 
>> API as "normal" code. So:
>>
>> (reduce (filter ... (map ... coll)))
>>
>> becomes:
>>
>> (r/fold (r/filter ... (r/map ... coll)))
>>
>> A very small change to the reducers API - passing the collection to the 
>> reduce and/or fold - would avoid the problem:
>>
>> (r/fold (r/filter ... (r/map ...)) coll)
>>
>> Anyway - I'd be very grateful for any insight into either of the above 
>> questions. Or for suggestions for an alternative approach that might be 
>> more fruitful.
>>
>> Many thanks in advance,
>>
>> --
>> paul.butcher->msgCount++
>>
>> Snetterton, Castle Combe, Cadwell Park...
>> Who says I have a one track mind?
>>
>> http://www.paulbutcher.com/
>> LinkedIn: http://www.linkedin.com/in/paulbutcher
>> MSN: [email protected] <javascript:>
>> AIM: paulrabutcher
>> Skype: paulrabutcher
>>  
>>
>> -- 
>> -- 
>> You received this message because you are subscribed to the Google
>> Groups "Clojure" group.
>> To post to this group, send email to [email protected]<javascript:>
>> Note that posts from new members are moderated - please be patient with 
>> your first post.
>> To unsubscribe from this group, send email to
>> [email protected] <javascript:>
>> For more options, visit this group at
>> http://groups.google.com/group/clojure?hl=en
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "Clojure" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>  
>>  
>>
>
>
> -- 
> -- 
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to [email protected] <javascript:>
> Note that posts from new members are moderated - please be patient with 
> your first post.
> To unsubscribe from this group, send email to
> [email protected] <javascript:>
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> --- 
> You received this message because you are subscribed to the Google Groups 
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] <javascript:>.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: Parallelising over a lazy sequence - request for help

Reply via email to