Hello Folks, I have been trying to implement a tree reduction algorithm recently in spark but could not find suitable parallel operations. Assuming I have a general tree like the following -
I have to do the following - 1) Do some computation at each leaf node to get an array of doubles.(This can be pre computed) 2) For each non leaf node, starting with the root node compute the sum of these arrays for all child nodes. So to get the array for node B, I need to get the array for E, which is the sum of G + H. ////////////////////// Start Snippet case class Node(name: String, children: Array[Node], values: Array[Double]) // read in the tree here def getSumOfChildren(node: Node) : Array[Double] = { if(node.isLeafNode) { return node.values } foreach(child in node.children) { // can use an accumulator here node.values = (node.values, getSumOfChildren(child)).zipped.map(_+_) } node.values } ////////////////////////// End Snippet Any pointers to how this can be done in parallel to use all cores will be greatly appreciated. Thanks, Boromir.