Re: TreeReduce Functionality in Spark

2015-06-04 Thread DB Tsai
For the first round, you will have 16 reducers working since you have 32 partitions. Two of 32 partitions will know which reducer they will go by sharing the same key using reduceByKey. After this step is done, you will have 16 partitions, so the next round will be 8 reducers. Sincerely, DB Tsai

Re: TreeReduce Functionality in Spark

2015-06-04 Thread Raghav Shankar
Hey DB, Thanks for the reply! I still don't think this answers my question. For example, if I have a top() action being executed and I have 32 workers(32 partitions), and I choose a depth of 4, what does the overlay of intermediate reducers look like? How many reducers are there excluding the mas

Re: TreeReduce Functionality in Spark

2015-06-04 Thread DB Tsai
By default, the depth of the tree is 2. Each partition will be one node. Sincerely, DB Tsai --- Blog: https://www.dbtsai.com On Thu, Jun 4, 2015 at 10:46 AM, Raghav Shankar wrote: > Hey Reza, > > Thanks for your response! > > Your response cl

Re: TreeReduce Functionality in Spark

2015-06-04 Thread Raghav Shankar
Hey Reza, Thanks for your response! Your response clarifies some of my initial thoughts. However, what I don't understand is how the depth of the tree is used to identify how many intermediate reducers there will be, and how many partitions are sent to the intermediate reducers. Could you provide

Re: TreeReduce Functionality in Spark

2015-06-04 Thread Reza Zadeh
In a regular reduce, all partitions have to send their reduced value to a single machine, and that machine can become a bottleneck. In a treeReduce, the partitions talk to each other in a logarithmic number of rounds. Imagine a binary tree that has all the partitions at its leaves and the root wil