subject:"ReduceByKeyAndWindow does repartitioning twice on recovering from checkpoint"

Re: ReduceByKeyAndWindow does repartitioning twice on recovering from checkpoint

2015-11-16 Thread Tathagata Das

ReduceByKeyAndWindow with inverse function has certain unique characteristics - it reuses a lot of the intermediate partitioned, partially-reduced data. For every new batch, it "reduces" the new data, and "inverse reduces" the old out-of-window data. That inverse-reduce needs data from an old batch

ReduceByKeyAndWindow does repartitioning twice on recovering from checkpoint

2015-11-15 Thread kundan kumar

Hi, I am using spark streaming check-pointing mechanism and reading the data from Kafka. The window duration for my application is 2 hrs with a sliding interval of 15 minutes. So, my batches run at following intervals... - 09:45 - 10:00 - 10:15 - 10:30 - and so on When my job is