Re: Parallelising over a lazy sequence - request for help

Andy Fingerhut Sat, 28 Sep 2013 08:27:34 -0700

I do not know about the most important parts of your performance difficulties, 
but on a more trivial point I might be able to shed some light.


See the ClojureDocs page for pmap, which refers to the page for future, linked 
below.  If you call (shutdown-agents) the 60-second wait to exit should go away.

    http://clojuredocs.org/clojure_core/clojure.core/future

Andy

Sent from my iPhone

On Sep 28, 2013, at 1:41 AM, Paul Butcher <[email protected]> wrote:

> On 28 Sep 2013, at 00:27, Stuart Halloway <[email protected]> wrote:
> 
>> I have posted an example that shows partition-then-fold at 
>> https://github.com/stuarthalloway/exploring-clojure/blob/master/examples/exploring/reducing_apple_pie.clj.
>> 
>> I would be curious to know how this approach performs with your data.  With 
>> the generated data I used, the partition+fold and partition+pmap approaches 
>> both used most of my cores and had similar perf.
> 
> Hey Stuart, 
> 
> Thanks for getting back to me.
> 
> I've updated the code for my word count example on GitHub with (I believe) 
> something that works along the lines you suggest:
> 
> https://github.com/paulbutcher/parallel-word-count
> 
> Here are some sample runs on my machine (a 4-core retina MacBook Pro). Each 
> of these runs counts the words in the first 10000 pages of Wikipedia:
> 
>> $ lein run 10000 ~/enwiki.xml sequential
>> "Elapsed time: 23630.443 msecs"
>> $ lein run 10000 ~/enwiki.xml fold
>> "Elapsed time: 8697.79 msecs"
>> $ lein run 10000 ~/enwiki.xml pmap 10000
>> "Elapsed time: 27393.703 msecs"
>> $ lein run 10000 ~/enwiki.xml pthenf 10000
>> "Elapsed time: 37263.193 msecs"
> 
> As you can see, the the foldable-seq version gives an almost 3x speedup 
> relative to the sequential version, and both the partition-then-pmap and 
> partition-then-fold versions are significantly slower.
> 
> The last argument for the partition-then-pmap and partition-then-fold 
> versions is a partition size. I've tried various different sizes with no 
> material effect:
> 
>> $ lein run 10000 ~/enwiki.xml pthenf 1000
>> "Elapsed time: 43187.385 msecs"
>> $ lein run 10000 ~/enwiki.xml pthenf 100000
>> "Elapsed time: 35702.651 msecs"
>> $ lein run 10000 ~/enwiki.xml pmap 1000
>> "Elapsed time: 34087.314 msecs"
>> $ lein run 10000 ~/enwiki.xml pmap 100000
>> "Elapsed time: 47340.969 msecs"
> 
> The performance of the partition-then-pmap version is actually much worse 
> than the numbers above suggest. There's something very weird going on with (I 
> guess) garbage collection - it takes a *long* time to finish after printing 
> the elapsed time and the performance is completely pathological with larger 
> page counts.
> 
> Bottom line: the only way I've been able to obtain any kind of speedup 
> remains foldable-seq.
> 
> I'd be very grateful indeed if you could take a look at how I've implemented 
> partition-then-fold to make sure that I've correctly captured your intent. Or 
> if you have any suggestions for anything else that might work, or to explain 
> the poor performance of partition-then-pmap and partition-then-fold.
> 
> My guess is that the problem with partition-then-fold is the copying that's 
> going on during the (into [] %). I can see that it is operating in parallel 
> because the number of cores in use goes up, but the net effect is an overall 
> slowdown rather than a speedup.
> 
> That it performs worse than foldable-seq isn't surprising to me, given that 
> it introduces an unnecessary copy.
> 
> I still think that it's a crying shame to disallow folding over sequences - 
> as the above shows, the gains both in performance and programming ease are 
> significant, and it would only take a small modification to the reducers API 
> to fix the holding-onto-head problem. What would be the downside of making 
> this modification and allowing foldable sequences?
> 
> --
> paul.butcher->msgCount++
> 
> Snetterton, Castle Combe, Cadwell Park...
> Who says I have a one track mind?
> 
> http://www.paulbutcher.com/
> LinkedIn: http://www.linkedin.com/in/paulbutcher
> MSN: [email protected]
> AIM: paulrabutcher
> Skype: paulrabutcher
> 
> On 28 Sep 2013, at 00:27, Stuart Halloway <[email protected]> wrote:
> 
>> Hi Paul,
>> 
>> I have posted an example that shows partition-then-fold at 
>> https://github.com/stuarthalloway/exploring-clojure/blob/master/examples/exploring/reducing_apple_pie.clj.
>> 
>> I would be curious to know how this approach performs with your data.  With 
>> the generated data I used, the partition+fold and partition+pmap approaches 
>> both used most of my cores and had similar perf.
>> 
>> Enjoying your book!
>> 
>> Stu
>> 
>> 
>> On Sat, May 25, 2013 at 12:34 PM, Paul Butcher <[email protected]> wrote:
>>> I'm currently working on a book on concurrent/parallel development for The 
>>> Pragmatic Programmers. One of the subjects I'm covering is parallel 
>>> programming in Clojure, but I've hit a roadblock with one of the examples. 
>>> I'm hoping that I can get some help to work through it here.
>>> 
>>> The example counts the words contained within a Wikipedia dump. It should 
>>> respond well to parallelisation (I have Java and Scala versions that 
>>> perform excellently) but I've been incapable of coming up with a nice 
>>> solution in Clojure.
>>> 
>>> The code I'm working with is checked into GitHub: 
>>> 
>>> The basic sequential algorithm is:
>>> 
>>>> (frequencies (mapcat get-words pages))
>>> 
>>> 
>>> If I run that on the first 10k pages in Wikipedia dump, it takes ~21s on my 
>>> MacBook Pro.
>>> 
>>> One way to parallelise it is to create a parallel version of frequencies 
>>> that uses reducers:
>>> 
>>>> (defn frequencies-fold [words]
>>>>   (r/fold (partial merge-with +)
>>>>           (fn [counts word] (assoc counts word (inc (get counts word 0))))
>>>> 
>>>>           words))
>>> 
>>> And sure enough, if I use that, along with use the foldable-seq utility I 
>>> posted about here are while ago it runs in ~8s, almost a 3x speedup, not 
>>> bad given that the parallel version is unable to use transients.
>>> 
>>> Unfortunately, as they currently stand, reducers have a fatal flaw that 
>>> means that, even with foldable-seq, they're basically useless with lazy 
>>> sequences. Reducers always hold onto the head of the sequence they're 
>>> given, so there's no way to use this approach for a complete Wikipedia dump 
>>> (which runs to around 40GiB).
>>> 
>>> So the other approach I've tried is to use pmap:
>>> 
>>>> (defn frequencies-pmap [words]
>>>> 
>>>>   (reduce (partial merge-with +) 
>>>> 
>>>>     (pmap frequencies 
>>>> 
>>>>       (partition-all 10000 words))))
>>> 
>>> But, for reasons I don't understand, this performs dreadfully - taking 
>>> ~26s, i.e. significantly slower than the sequential version.
>>> 
>>> I've tried playing with different partition sizes without materially 
>>> affecting the result.
>>> 
>>> So, what I'm looking for is either:
>>> 
>>> a) An explanation for why the pmap-based solution performs so badly
>>> 
>>> b) A way to fix the "holding onto head" problem that's inherent within 
>>> reducers.
>>> 
>>> With the last of these in mind, it strikes me that the problem 
>>> fundamentally arises from the desire for reducers to follow the same basic 
>>> API as "normal" code. So:
>>> 
>>> (reduce (filter ... (map ... coll)))
>>> 
>>> becomes:
>>> 
>>> (r/fold (r/filter ... (r/map ... coll)))
>>> 
>>> A very small change to the reducers API - passing the collection to the 
>>> reduce and/or fold - would avoid the problem:
>>> 
>>> (r/fold (r/filter ... (r/map ...)) coll)
>>> 
>>> Anyway - I'd be very grateful for any insight into either of the above 
>>> questions. Or for suggestions for an alternative approach that might be 
>>> more fruitful.
>>> 
>>> Many thanks in advance,
>>> 
>>> --
>>> paul.butcher->msgCount++
>>> 
>>> Snetterton, Castle Combe, Cadwell Park...
>>> Who says I have a one track mind?
>>> 
>>> http://www.paulbutcher.com/
>>> LinkedIn: http://www.linkedin.com/in/paulbutcher
>>> MSN: [email protected]
>>> AIM: paulrabutcher
>>> Skype: paulrabutcher
>>> 
>>> 
>>> -- 
>>> -- 
>>> You received this message because you are subscribed to the Google
>>> Groups "Clojure" group.
>>> To post to this group, send email to [email protected]
>>> Note that posts from new members are moderated - please be patient with 
>>> your first post.
>>> To unsubscribe from this group, send email to
>>> [email protected]
>>> For more options, visit this group at
>>> http://groups.google.com/group/clojure?hl=en
>>> --- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "Clojure" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to [email protected].
>>> For more options, visit https://groups.google.com/groups/opt_out.
>> 
>> 
>> -- 
>> -- 
>> You received this message because you are subscribed to the Google
>> Groups "Clojure" group.
>> To post to this group, send email to [email protected]
>> Note that posts from new members are moderated - please be patient with your 
>> first post.
>> To unsubscribe from this group, send email to
>> [email protected]
>> For more options, visit this group at
>> http://groups.google.com/group/clojure?hl=en
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "Clojure" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> For more options, visit https://groups.google.com/groups/opt_out.
> 
> -- 
> -- 
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to [email protected]
> Note that posts from new members are moderated - please be patient with your 
> first post.
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> --- 
> You received this message because you are subscribed to the Google Groups 
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: Parallelising over a lazy sequence - request for help

Reply via email to