Hi, I am trying to understand the behavior of rdd.checkpoint() in Spark. I am running the JavaPageRank <https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/JavaPageRank.java> example on a 1 GB graph and I am checkpointing the *ranks *rdd inside each iteration (between line 125 and 126 in the given link). Spark execution starts when it hits the *collect()* action. I am expecting that after each iteration the intermediate ranks will be materialized and written in the checkpoint dir but, it seems that the rdd is only written once in the end of the program, although I am invoking ranks.checkpoint() inside the for loop. Is that the default behavior ?
Note that I am caching the rdd before checkpointing in order to avoid recomputing Best Regards, Tarek