Thank you, one of my mistakes was to think that show() was an action.

2017-07-13 17:52 GMT+02:00 Vadim Semenov <vadim.seme...@datadoghq.com>:

> You need to trigger an action on that rdd to checkpoint it.
>
> ```
> scala>    spark.sparkContext.setCheckpointDir(".")
>
> scala>    val df = spark.createDataFrame(List(("Scala", 35), ("Python",
> 30), ("R", 15), ("Java", 20)))
> df: org.apache.spark.sql.DataFrame = [_1: string, _2: int]
>
> scala> df.rdd.checkpoint()
>
> scala> df.rdd.isCheckpointed
> res2: Boolean = false
>
> scala> df.show()
> +------+---+
> |    _1| _2|
> +------+---+
> | Scala| 35|
> |Python| 30|
> |     R| 15|
> |  Java| 20|
> +------+---+
>
>
> scala> df.rdd.isCheckpointed
> res4: Boolean = false
>
> scala> df.rdd.count()
> res5: Long = 4
>
> scala> df.rdd.isCheckpointed
> res6: Boolean = true
> ```
>
> On Thu, Jul 13, 2017 at 11:35 AM, Bernard Jesop <bernard.je...@gmail.com>
> wrote:
>
>> Hi everyone, I just tried this simple program :
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> * import
>> org.apache.spark.sql.SparkSession
>>  object CheckpointTest extends App
>> {
>>    val spark =
>> SparkSession
>>
>> .builder()
>>
>> .appName("Toto")
>>
>> .getOrCreate()
>>
>> spark.sparkContext.setCheckpointDir(".")
>>    val df = spark.createDataFrame(List(("Scala", 35), ("Python", 30), ("R",
>> 15), ("Java",
>> 20)))
>>
>> df.show()
>>
>> df.rdd.checkpoint()
>>    println(if (df.rdd.isCheckpointed) "checkpointed" else "not
>> checkpointed")
>>  }*
>> But the result is still *"not checkpointed"*.
>> Do you have any idea why? (knowing that the checkpoint file is created)
>>
>> Best regards,
>> Bernard JESOP
>>
>
>

Reply via email to