To be clear, I don't object to the approach, only to naming it "fold" and/or tying it to interfaces related to folding.
Stu On Sat, Sep 28, 2013 at 5:29 PM, Paul Butcher <p...@paulbutcher.com> wrote: > On 28 Sep 2013, at 22:00, Alex Miller <a...@puredanger.com> wrote: > > Reducers (and fork/join in general) are best suited for fine-grained > computational parallelism on in-memory data. The problem in question > involves processing more data than will fit in memory. > > So the question is then what is the best way to parallelize computation > over the stream. There are many ways to do so - pmap is one easy mechanism > to get parallelism over a lazy sequence but the chunking behavior of pmap > can easily be non-optimal for particular use cases. Stu's other solution > (prepare-with-partition-then-fold) reads chunks of data into memory, then > uses f/j over the top of those chunks. > > > Yes - I'm aware that that's how Stu's solution works. However when I use > that in my example, the performance sucks. As I said in my earlier message, > I believe that the reason for this is the additional copying that it > involves. > > To date, I have exactly one implementation that has good parallel > performance - the one that uses foldable-seq (which is 3x faster than the > sequential version). Everything else I've tried performs worse than the > sequential version. > > Yet another possible solution is to use a pool of compute threads and to > read from the stream, give those compute tasks to the pool to execute in a > background thread, then reassemble the results at the end (or in a separate > thread). The balance in this approach is to make the tasks the right > "chunkiness". Using a buffered queue is one technique to avoid having your > input reader overload the capacity of the processing system. > > > That's basically *exactly* what the foldable-seq implementation does. > > I would also mention that using transients as you build input collections > will be a big win. > > > I'm sure that that's true - but the foldable-seq implementation is the one > that would gain most from that, and that's the one that already runs 3x > faster than the sequential implementation. But it's also the one that Stu > objects to. What I'm trying to find is an implementation that Stu doesn't > object to, but is faster than the sequential version of the algorithm. > > -- > paul.butcher->msgCount++ > > Snetterton, Castle Combe, Cadwell Park... > Who says I have a one track mind? > > http://www.paulbutcher.com/ > LinkedIn: http://www.linkedin.com/in/paulbutcher > MSN: p...@paulbutcher.com > AIM: paulrabutcher > Skype: paulrabutcher > > On 28 Sep 2013, at 22:00, Alex Miller <a...@puredanger.com> wrote: > > Reducers (and fork/join in general) are best suited for fine-grained > computational parallelism on in-memory data. The problem in question > involves processing more data than will fit in memory. > > So the question is then what is the best way to parallelize computation > over the stream. There are many ways to do so - pmap is one easy mechanism > to get parallelism over a lazy sequence but the chunking behavior of pmap > can easily be non-optimal for particular use cases. Stu's other solution > (prepare-with-partition-then-fold) reads chunks of data into memory, then > uses f/j over the top of those chunks. > > Yet another possible solution is to use a pool of compute threads and to > read from the stream, give those compute tasks to the pool to execute in a > background thread, then reassemble the results at the end (or in a separate > thread). The balance in this approach is to make the tasks the right > "chunkiness". Using a buffered queue is one technique to avoid having your > input reader overload the capacity of the processing system. > > I would also mention that using transients as you build input collections > will be a big win. > > Alex > > On Saturday, September 28, 2013 11:14:41 AM UTC-5, Jozef Wagner wrote: >> >> I would go a bit more further and suggest that you do not use sequences >> at all and work only with reducible/foldable collections. Make an input >> reader which returns a foldable collection and you will have the most >> performant solution. The thing about holding into the head is being worked >> on right now, see >> http://dev.clojure.org/**jira/browse/CLJ-1250<http://dev.clojure.org/jira/browse/CLJ-1250> >> >> JW >> >> On Saturday, September 28, 2013 10:41:20 AM UTC+2, Paul Butcher wrote: >>> >>> On 28 Sep 2013, at 00:27, Stuart Halloway <stuart....@gmail.com> wrote: >>> >>> I have posted an example that shows partition-then-fold at >>> https://github.com/**stuarthalloway/exploring-** >>> clojure/blob/master/examples/**exploring/reducing_apple_pie.**clj<https://github.com/stuarthalloway/exploring-clojure/blob/master/examples/exploring/reducing_apple_pie.clj> >>> . >>> >>> I would be curious to know how this approach performs with your data. >>> With the generated data I used, the partition+fold and partition+pmap >>> approaches both used most of my cores and had similar perf. >>> >>> >>> Hey Stuart, >>> >>> Thanks for getting back to me. >>> >>> I've updated the code for my word count example on GitHub with (I >>> believe) something that works along the lines you suggest: >>> >>> https://github.com/**paulbutcher/parallel-word-**count<https://github.com/paulbutcher/parallel-word-count> >>> >>> Here are some sample runs on my machine (a 4-core retina MacBook Pro). >>> Each of these runs counts the words in the first 10000 pages of Wikipedia: >>> >>> $ lein run 10000 ~/enwiki.xml sequential >>> "Elapsed time: 23630.443 msecs" >>> $ lein run 10000 ~/enwiki.xml fold >>> "Elapsed time: 8697.79 msecs" >>> $ lein run 10000 ~/enwiki.xml pmap 10000 >>> "Elapsed time: 27393.703 msecs" >>> $ lein run 10000 ~/enwiki.xml pthenf 10000 >>> "Elapsed time: 37263.193 msecs" >>> >>> >>> As you can see, the the foldable-seq version gives an almost 3x speedup >>> relative to the sequential version, and both the partition-then-pmap and >>> partition-then-fold versions are significantly slower. >>> >>> The last argument for the partition-then-pmap and partition-then-fold >>> versions is a partition size. I've tried various different sizes with no >>> material effect: >>> >>> $ lein run 10000 ~/enwiki.xml pthenf 1000 >>> "Elapsed time: 43187.385 msecs" >>> $ lein run 10000 ~/enwiki.xml pthenf 100000 >>> "Elapsed time: 35702.651 msecs" >>> $ lein run 10000 ~/enwiki.xml pmap 1000 >>> "Elapsed time: 34087.314 msecs" >>> $ lein run 10000 ~/enwiki.xml pmap 100000 >>> "Elapsed time: 47340.969 msecs" >>> >>> >>> The performance of the partition-then-pmap version is actually much >>> worse than the numbers above suggest. There's something very weird going on >>> with (I guess) garbage collection - it takes a *long* time to finish after >>> printing the elapsed time and the performance is completely pathological >>> with larger page counts. >>> >>> Bottom line: the only way I've been able to obtain any kind of speedup >>> remains foldable-seq. >>> >>> I'd be very grateful indeed if you could take a look at how I've >>> implemented partition-then-fold to make sure that I've correctly captured >>> your intent. Or if you have any suggestions for anything else that might >>> work, or to explain the poor performance of partition-then-pmap and >>> partition-then-fold. >>> >>> My guess is that the problem with partition-then-fold is the copying >>> that's going on during the (into [] %). I can see that it is operating in >>> parallel because the number of cores in use goes up, but the net effect is >>> an overall slowdown rather than a speedup. >>> >>> That it performs worse than foldable-seq isn't surprising to me, given >>> that it introduces an unnecessary copy. >>> >>> I still think that it's a crying shame to disallow folding over >>> sequences - as the above shows, the gains both in performance and >>> programming ease are significant, and it would only take a small >>> modification to the reducers API to fix the holding-onto-head problem. What >>> would be the downside of making this modification and allowing foldable >>> sequences? >>> >>> -- >>> paul.butcher->msgCount++ >>> >>> Snetterton, Castle Combe, Cadwell Park... >>> Who says I have a one track mind? >>> >>> http://www.paulbutcher.com/ >>> LinkedIn: >>> http://www.linkedin.**com/in/paulbutcher<http://www.linkedin.com/in/paulbutcher> >>> MSN: pa...@paulbutcher.com >>> AIM: paulrabutcher >>> Skype: paulrabutcher >>> >>> On 28 Sep 2013, at 00:27, Stuart Halloway <stuart....@gmail.com> wrote: >>> >>> Hi Paul, >>> >>> I have posted an example that shows partition-then-fold at >>> https://github.com/**stuarthalloway/exploring-** >>> clojure/blob/master/examples/**exploring/reducing_apple_pie.**clj<https://github.com/stuarthalloway/exploring-clojure/blob/master/examples/exploring/reducing_apple_pie.clj> >>> . >>> >>> I would be curious to know how this approach performs with your data. >>> With the generated data I used, the partition+fold and partition+pmap >>> approaches both used most of my cores and had similar perf. >>> >>> Enjoying your book! >>> >>> Stu >>> >>> >>> On Sat, May 25, 2013 at 12:34 PM, Paul Butcher <pa...@paulbutcher.com>wrote: >>> >>>> I'm currently working on a book on concurrent/parallel development for >>>> The Pragmatic Programmers. One of the subjects I'm covering is parallel >>>> programming in Clojure, but I've hit a roadblock with one of the examples. >>>> I'm hoping that I can get some help to work through it here. >>>> >>>> The example counts the words contained within a Wikipedia dump. It >>>> should respond well to parallelisation (I have Java and Scala versions that >>>> perform excellently) but I've been incapable of coming up with a nice >>>> solution in Clojure. >>>> >>>> The code I'm working with is checked into >>>> GitHub<https://github.com/paulbutcher/parallel-word-count> >>>> : >>>> >>>> The basic sequential algorithm is: >>>> >>>> (frequencies (mapcat get-words pages)) >>>> >>>> >>>> If I run that on the first 10k pages in Wikipedia dump, it takes ~21s >>>> on my MacBook Pro. >>>> >>>> One way to parallelise it is to create a parallel version of >>>> frequencies that uses reducers: >>>> >>>> (defn frequencies-fold [words] >>>> (r/fold (partial merge-with +) >>>> (fn [counts word] (assoc cou**nts word (inc (get counts word* >>>> * 0)))) >>>> >>>> >>>> words)) >>>> >>>> >>>> And sure enough, if I use that, along with use the >>>> foldable-seq<https://github.com/paulbutcher/parallel-word-count/blob/master/src/foldable_seq/core.clj> >>>> utility >>>> I posted about >>>> here<https://groups.google.com/d/msg/clojure/8RKCjF00ukQ/b5mmmOB5Uh4J> are >>>> while ago it runs in ~8s, almost a 3x speedup, not bad given that the >>>> parallel version is unable to use transients. >>>> >>>> Unfortunately, as they currently stand, reducers have a fatal flaw that >>>> means that, even with foldable-seq, they're basically useless with lazy >>>> sequences. Reducers always hold onto the >>>> head<https://groups.google.com/d/msg/clojure-dev/qJo7z_9CVdw/ARnHe1bThuMJ> >>>> of >>>> the sequence they're given, so there's no way to use this approach for a >>>> complete Wikipedia dump (which runs to around 40GiB). >>>> >>>> So the other approach I've tried is to use pmap: >>>> >>>> (defn frequencies-pmap [words] >>>> (reduce (partial merge-with +) >>>> (pmap frequencies >>>> (partition-all 10000 words)))) >>>> >>>> >>>> But, for reasons I don't understand, this performs dreadfully - taking >>>> ~26s, i.e. significantly slower than the sequential version. >>>> >>>> I've tried playing with different partition sizes without materially >>>> affecting the result. >>>> >>>> So, what I'm looking for is either: >>>> >>>> a) An explanation for why the pmap-based solution performs so badly >>>> >>>> b) A way to fix the "holding onto head" problem that's inherent within >>>> reducers. >>>> >>>> With the last of these in mind, it strikes me that the problem >>>> fundamentally arises from the desire for reducers to follow the same basic >>>> API as "normal" code. So: >>>> >>>> (reduce (filter ... (map ... coll))) >>>> >>>> becomes: >>>> >>>> (r/fold (r/filter ... (r/map ... coll))) >>>> >>>> A very small change to the reducers API - passing the collection to the >>>> reduce and/or fold - would avoid the problem: >>>> >>>> (r/fold (r/filter ... (r/map ...)) coll) >>>> >>>> Anyway - I'd be very grateful for any insight into either of the above >>>> questions. Or for suggestions for an alternative approach that might be >>>> more fruitful. >>>> >>>> Many thanks in advance, >>>> >>>> -- >>>> paul.butcher->msgCount++ >>>> >>>> Snetterton, Castle Combe, Cadwell Park... >>>> Who says I have a one track mind? >>>> >>>> http://www.paulbutcher.com/ >>>> LinkedIn: >>>> http://www.linkedin.**com/in/paulbutcher<http://www.linkedin.com/in/paulbutcher> >>>> MSN: pa...@paulbutcher.com >>>> AIM: paulrabutcher >>>> Skype: paulrabutcher >>>> >>>> >>>> -- >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "Clojure" group. >>>> To post to this group, send email to clo...@googlegroups.com >>>> Note that posts from new members are moderated - please be patient with >>>> your first post. >>>> To unsubscribe from this group, send email to >>>> clojure+u...@googlegroups.com >>>> For more options, visit this group at >>>> http://groups.google.com/**group/clojure?hl=en<http://groups.google.com/group/clojure?hl=en> >>>> --- >>>> You received this message because you are subscribed to the Google >>>> Groups "Clojure" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to clojure+u...@googlegroups.com. >>>> For more options, visit >>>> https://groups.google.com/**groups/opt_out<https://groups.google.com/groups/opt_out> >>>> . >>>> >>>> >>>> >>> >>> >>> -- >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Clojure" group. >>> To post to this group, send email to clo...@googlegroups.com >>> Note that posts from new members are moderated - please be patient with >>> your first post. >>> To unsubscribe from this group, send email to >>> clojure+u...@googlegroups.com >>> For more options, visit this group at >>> http://groups.google.com/**group/clojure?hl=en<http://groups.google.com/group/clojure?hl=en> >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "Clojure" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to clojure+u...@googlegroups.com. >>> For more options, visit >>> https://groups.google.com/**groups/opt_out<https://groups.google.com/groups/opt_out> >>> . >>> >>> >>> > -- > -- > You received this message because you are subscribed to the Google > Groups "Clojure" group. > To post to this group, send email to clojure@googlegroups.com > Note that posts from new members are moderated - please be patient with > your first post. > To unsubscribe from this group, send email to > clojure+unsubscr...@googlegroups.com > For more options, visit this group at > http://groups.google.com/group/clojure?hl=en > --- > You received this message because you are subscribed to the Google Groups > "Clojure" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to clojure+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out. > > > -- > -- > You received this message because you are subscribed to the Google > Groups "Clojure" group. > To post to this group, send email to clojure@googlegroups.com > Note that posts from new members are moderated - please be patient with > your first post. > To unsubscribe from this group, send email to > clojure+unsubscr...@googlegroups.com > For more options, visit this group at > http://groups.google.com/group/clojure?hl=en > --- > You received this message because you are subscribed to the Google Groups > "Clojure" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to clojure+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out. > -- -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.