Thank you, one of my mistakes was to think that show() was an action. 2017-07-13 17:52 GMT+02:00 Vadim Semenov <vadim.seme...@datadoghq.com>:
> You need to trigger an action on that rdd to checkpoint it. > > ``` > scala> spark.sparkContext.setCheckpointDir(".") > > scala> val df = spark.createDataFrame(List(("Scala", 35), ("Python", > 30), ("R", 15), ("Java", 20))) > df: org.apache.spark.sql.DataFrame = [_1: string, _2: int] > > scala> df.rdd.checkpoint() > > scala> df.rdd.isCheckpointed > res2: Boolean = false > > scala> df.show() > +------+---+ > | _1| _2| > +------+---+ > | Scala| 35| > |Python| 30| > | R| 15| > | Java| 20| > +------+---+ > > > scala> df.rdd.isCheckpointed > res4: Boolean = false > > scala> df.rdd.count() > res5: Long = 4 > > scala> df.rdd.isCheckpointed > res6: Boolean = true > ``` > > On Thu, Jul 13, 2017 at 11:35 AM, Bernard Jesop <bernard.je...@gmail.com> > wrote: > >> Hi everyone, I just tried this simple program : >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> * import >> org.apache.spark.sql.SparkSession >> object CheckpointTest extends App >> { >> val spark = >> SparkSession >> >> .builder() >> >> .appName("Toto") >> >> .getOrCreate() >> >> spark.sparkContext.setCheckpointDir(".") >> val df = spark.createDataFrame(List(("Scala", 35), ("Python", 30), ("R", >> 15), ("Java", >> 20))) >> >> df.show() >> >> df.rdd.checkpoint() >> println(if (df.rdd.isCheckpointed) "checkpointed" else "not >> checkpointed") >> }* >> But the result is still *"not checkpointed"*. >> Do you have any idea why? (knowing that the checkpoint file is created) >> >> Best regards, >> Bernard JESOP >> > >