If the tree is too big build it on graphx....but it will need thorough analysis so that the partitions are well balanced...
On Tue, Sep 30, 2014 at 2:45 PM, Andy Twigg <[email protected]> wrote: > Hi Boromir, > > Assuming the tree fits in memory, and what you want to do is parallelize > the computation, the 'obvious' way is the following: > > * broadcast the tree T to each worker (ok since it fits in memory) > * construct an RDD for the deepest level - each element in the RDD is > (parent,data_at_node) > * aggregate this by key (=parent) -> RDD[parent,data] > * map each element (p, data) -> (parent(p), data) using T > * repeat until you have an RDD of size = 1 (assuming T is connected) > > If T cannot fit in memory, or is very deep, then there are more exotic > techniques, but hopefully this suffices. > > Andy > > > -- > http://www.cs.ox.ac.uk/people/andy.twigg/ > > On 30 September 2014 14:12, Boromir Widas <[email protected]> wrote: > >> Hello Folks, >> >> I have been trying to implement a tree reduction algorithm recently in >> spark but could not find suitable parallel operations. Assuming I have a >> general tree like the following - >> >> >> >> I have to do the following - >> 1) Do some computation at each leaf node to get an array of doubles.(This >> can be pre computed) >> 2) For each non leaf node, starting with the root node compute the sum of >> these arrays for all child nodes. So to get the array for node B, I need to >> get the array for E, which is the sum of G + H. >> >> ////////////////////// Start Snippet >> case class Node(name: String, children: Array[Node], values: >> Array[Double]) >> >> // read in the tree here >> >> def getSumOfChildren(node: Node) : Array[Double] = { >> if(node.isLeafNode) { >> return node.values >> } >> foreach(child in node.children) { >> // can use an accumulator here >> node.values = (node.values, >> getSumOfChildren(child)).zipped.map(_+_) >> } >> node.values >> } >> ////////////////////////// End Snippet >> >> Any pointers to how this can be done in parallel to use all cores will be >> greatly appreciated. >> >> Thanks, >> Boromir. >> >> >
