In the update function you can return None for a key and it will remove it. If you’re restarting your app you can delete your checkpoint directory to start from scratch, rather than continuing from the previous state.
From: Sandeep Giri <[email protected]<mailto:[email protected]>> Date: Friday, October 30, 2015 at 9:29 AM To: skaarthik oss <[email protected]<mailto:[email protected]>> Cc: dev <[email protected]<mailto:[email protected]>>, user <[email protected]<mailto:[email protected]>> Subject: Re: Maintaining overall cumulative data in Spark Streaming How to we reset the aggregated statistics to null? Regards, Sandeep Giri, +1 347 781 4573 (US) +91-953-899-8962 (IN) www.KnowBigData.com.<http://KnowBigData.com.> Phone: +1-253-397-1945 (Office) [linkedin icon]<https://linkedin.com/company/knowbigdata>[other site icon]<http://knowbigdata.com> [facebook icon] <https://facebook.com/knowbigdata> [twitter icon] <https://twitter.com/IKnowBigData> <https://twitter.com/IKnowBigData> On Fri, Oct 30, 2015 at 9:49 AM, Sandeep Giri <[email protected]<mailto:[email protected]>> wrote: Yes, update state by key worked. Though there are some more complications. On Oct 30, 2015 8:27 AM, "skaarthik oss" <[email protected]<mailto:[email protected]>> wrote: Did you consider UpdateStateByKey operation? From: Sandeep Giri [mailto:[email protected]<mailto:[email protected]>] Sent: Thursday, October 29, 2015 3:09 PM To: user <[email protected]<mailto:[email protected]>>; dev <[email protected]<mailto:[email protected]>> Subject: Maintaining overall cumulative data in Spark Streaming Dear All, If a continuous stream of text is coming in and you have to keep publishing the overall word count so far since 0:00 today, what would you do? Publishing the results for a window is easy but if we have to keep aggregating the results, how to go about it? I have tried to keep an StreamRDD with aggregated count and keep doing a fullouterjoin but didn't work. Seems like the StreamRDD gets reset. Kindly help. Regards, Sandeep Giri
