subject:"Re\: parallel Reduce within a key"

Re: parallel Reduce within a key

2014-06-20 Thread Michael Malak

How about a treeReduceByKey? :-) On Friday, June 20, 2014 11:55 AM, DB Tsai wrote: Currently, the reduce operation combines the result from mapper sequentially, so it's O(n). Xiangrui is working on treeReduce which is O(log(n)). Based on the benchmark, it dramatically increase the performan

Re: parallel Reduce within a key

2014-06-20 Thread DB Tsai

Currently, the reduce operation combines the result from mapper sequentially, so it's O(n). Xiangrui is working on treeReduce which is O(log(n)). Based on the benchmark, it dramatically increase the performance. You can test the code in his own branch. https://github.com/apache/spark/pull/1110 Si