Any insights on this ?
On Fri, Jan 29, 2016 at 1:08 PM, Abhishek Anand <abhis.anan...@gmail.com> wrote: > Hi All, > > Can someone help me with the following doubts regarding checkpointing : > > My code flow is something like follows -> > > 1) create direct stream from kafka > 2) repartition kafka stream > 3) mapToPair followed by reduceByKey > 4) filter > 5) reduceByKeyAndWindow without the inverse function > 6) write to cassandra > > Now when I restart my application from checkpoint, I see repartition and > other steps being called for the previous windows which takes longer and > delays my aggregations. > > My understanding was that once data checkpointing is done it should not > re-read from kafka and use the saved RDDs but guess I am wrong. > > Is there a way to avoid the repartition or any workaround for this. > > Spark Version is 1.4.0 > > Cheers !! > Abhi >