After every checkpointing interval, the latest state RDD is stored to HDFS in its entirety. Along with that, the series of DStream transformations that was setup with the streaming context is also stored into HDFS (the whole DAG of DStream objects is serialized and saved).
TD On Wed, Jul 16, 2014 at 5:38 PM, Yan Fang <yanfang...@gmail.com> wrote: > Hi guys, > > am wondering how the RDD checkpointing > <https://spark.apache.org/docs/latest/streaming-programming-guide.html#RDD > Checkpointing> works in Spark Streaming. When I use updateStateByKey, does > the Spark store the entire state (at one time point) into the HDFS or only > put the transformation into the HDFS? Thank you. > > Best, > > Fang, Yan > yanfang...@gmail.com > +1 (206) 849-4108 >