Spark Checkpointing behavior

Tarek Elgamal Sat, 27 Feb 2016 00:59:31 -0800

Hi,

I am trying to understand the behavior of rdd.checkpoint() in Spark. I am
running the JavaPageRank
<https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/JavaPageRank.java>
example on a 1 GB graph and I am checkpointing the *ranks *rdd inside each
iteration (between line 125 and 126 in the given link). Spark execution
starts when it hits the *collect()* action. I am expecting that after each
iteration the intermediate ranks will be materialized and written in the
checkpoint dir but, it seems that the rdd is only written once in the end
of the program, although I am invoking ranks.checkpoint() inside the for
loop. Is that the default behavior ?


Note that I am caching the rdd before checkpointing in order to avoid
recomputing

Best Regards,
Tarek

Spark Checkpointing behavior

Reply via email to