Re: Incredible slow iterative computation

2014-05-06 Thread Andrea Esposito
gt; -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Incredible-slow-iterative-computation-tp4204p5407.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >

Re: Incredible slow iterative computation

2014-05-05 Thread Earthson
you have new broadcast object for each step of iteration, broadcast will eat up all of the memory. You may need to set "spark.cleaner.ttl" to a small enough value. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Incredible-slow-iterative-computation-tp4

Re: Incredible slow iterative computation

2014-05-05 Thread Matei Zaharia
It may be slow because of serialization (have you tried Kryo there?) or just because at some point the data starts to be on disk. Try profiling the tasks while it’s running (e.g. just use jstack to see what they’re doing) and definitely try Kryo if you’re currently using Java Serialization. Kryo

Re: Incredible slow iterative computation

2014-05-05 Thread Andrea Esposito
Update: Checkpointing it doesn't perform. I checked by the "isCheckpointed" method but it returns always false. ??? 2014-05-05 23:14 GMT+02:00 Andrea Esposito : > Checkpoint doesn't help it seems. I do it at each iteration/superstep. > > Looking deeply, the RDDs are recomputed just few times at

Re: Incredible slow iterative computation

2014-05-02 Thread Andrew Ash
If you end up with a really long dependency tree between RDDs (like 100+) people have reported success with using the .checkpoint() method. This computes the RDD and then saves it, flattening the dependency tree. It turns out that having a really long RDD dependency graph causes serialization siz

Re: Incredible slow iterative computation

2014-05-02 Thread Andrea Esposito
Sorry for the very late answer. I carefully follow what you have pointed out and i figure out that the structure used for each record was too big with many small objects. Changing it the memory usage drastically decrease. Despite that i'm still struggling with the behaviour of decreasing performa

Re: Incredible slow iterative computation

2014-04-14 Thread Andrew Ash
A lot of your time is being spent in garbage collection (second image). Maybe your dataset doesn't easily fit into memory? Can you reduce the number of new objects created in myFun? How big are your heap sizes? Another observation is that in the 4th image some of your RDDs are massive and some