Re: Does RDD checkpointing store the entire state in HDFS?

Tathagata Das Wed, 16 Jul 2014 18:55:16 -0700

After every checkpointing interval, the latest state RDD is stored to HDFS
in its entirety. Along with that, the series of DStream transformations
that was setup with the streaming context is also stored into HDFS (the
whole DAG of DStream objects is serialized and saved).


TD


On Wed, Jul 16, 2014 at 5:38 PM, Yan Fang <yanfang...@gmail.com> wrote:

> Hi guys,
>
> am wondering how the RDD checkpointing
> <https://spark.apache.org/docs/latest/streaming-programming-guide.html#RDD
> Checkpointing> works in Spark Streaming. When I use updateStateByKey, does
> the Spark store the entire state (at one time point) into the HDFS or only
> put the transformation into the HDFS? Thank you.
>
> Best,
>
> Fang, Yan
> yanfang...@gmail.com
> +1 (206) 849-4108
>

Re: Does RDD checkpointing store the entire state in HDFS?

Reply via email to