At 2014-08-25 06:41:36 -0700, BertrandR <bertrand.rondepierre...@gmail.com> 
wrote:
> Unfortunately, this works well for extremely small graphs, but it becomes
> exponentially slow with the size of the graph and the number of iterations
> (doesn't finish 20 iterations with graphs having 48000 edges).
> [...]
> It seems to me that a lot of things are unnecessarily recomputed at each
> iterations whatever I try to do. I also did multiple changes to limit the
> number of dependency of each object, but it didn't change anything.
> [...]
>       fusionBcst.unpersist(blocking = false)

The problem is almost certainly because of unpersisting. If you comment out all 
the unpersist lines, the program should run normally.

Unpersisting is very tricky because of the internal dependency structure of 
graphs: they maintain a vertex and an edge RDD, and each depends on both from 
the previous iteration.

A future update to GraphX will unify them so that a graph only has one RDD, and 
this will make it easier to unpersist correctly. Until then, unpersisting may 
not be worth the trouble.

Ankur

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to