Re: Repartition taking place for all previous windows even after checkpointing

Abhishek Anand Mon, 01 Feb 2016 02:31:39 -0800

Any insights on this ?


On Fri, Jan 29, 2016 at 1:08 PM, Abhishek Anand <abhis.anan...@gmail.com>
wrote:

> Hi All,
>
> Can someone help me with the following doubts regarding checkpointing :
>
> My code flow is something like follows ->
>
> 1) create direct stream from kafka
> 2) repartition kafka stream
> 3)  mapToPair followed by reduceByKey
> 4)  filter
> 5)  reduceByKeyAndWindow without the inverse function
> 6)  write to cassandra
>
> Now when I restart my application from checkpoint, I see repartition and
> other steps being called for the previous windows which takes longer and
> delays my aggregations.
>
> My understanding  was that once data checkpointing is done it should not
> re-read from kafka and use the saved RDDs but guess I am wrong.
>
> Is there a way to avoid the repartition or any workaround for this.
>
> Spark Version is 1.4.0
>
> Cheers !!
> Abhi
>

Re: Repartition taking place for all previous windows even after checkpointing

Reply via email to