Handling tree reduction algorithm with Spark in parallel

Boromir Widas Tue, 30 Sep 2014 14:13:26 -0700

Hello Folks,

I have been trying to implement a tree reduction algorithm recently in
spark but could not find suitable parallel operations. Assuming I have a
general tree like the following -




I have to do the following -
1) Do some computation at each leaf node to get an array of doubles.(This
can be pre computed)
2) For each non leaf node, starting with the root node compute the sum of
these arrays for all child nodes. So to get the array for node B, I need to
get the array for E, which is the sum of G + H.

////////////////////// Start Snippet
case class Node(name: String, children: Array[Node], values: Array[Double])

// read in the tree here

def getSumOfChildren(node: Node) : Array[Double] = {
    if(node.isLeafNode) {
      return node.values
   }
    foreach(child in node.children) {
       // can use an accumulator here
       node.values = (node.values, getSumOfChildren(child)).zipped.map(_+_)
   }
   node.values
}
////////////////////////// End Snippet

Any pointers to how this can be done in parallel to use all cores will be
greatly appreciated.

Thanks,
Boromir.

Handling tree reduction algorithm with Spark in parallel

Reply via email to