At 2014-08-25 06:41:36 -0700, BertrandR <bertrand.rondepierre...@gmail.com> wrote: > Unfortunately, this works well for extremely small graphs, but it becomes > exponentially slow with the size of the graph and the number of iterations > (doesn't finish 20 iterations with graphs having 48000 edges). > [...] > It seems to me that a lot of things are unnecessarily recomputed at each > iterations whatever I try to do. I also did multiple changes to limit the > number of dependency of each object, but it didn't change anything. > [...] > fusionBcst.unpersist(blocking = false)
The problem is almost certainly because of unpersisting. If you comment out all the unpersist lines, the program should run normally. Unpersisting is very tricky because of the internal dependency structure of graphs: they maintain a vertex and an edge RDD, and each depends on both from the previous iteration. A future update to GraphX will unify them so that a graph only has one RDD, and this will make it easier to unpersist correctly. Until then, unpersisting may not be worth the trouble. Ankur --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org