Re: Maintaining overall cumulative data in Spark Streaming

Silvio Fiorito Fri, 30 Oct 2015 07:01:56 -0700

In the update function you can return None for a key and it will remove it. If 
you’re restarting your app you can delete your checkpoint directory to start 
from scratch, rather than continuing from the previous state.

From: Sandeep Giri <[email protected]<mailto:[email protected]>>
Date: Friday, October 30, 2015 at 9:29 AM
To: skaarthik oss <[email protected]<mailto:[email protected]>>
Cc: dev <[email protected]<mailto:[email protected]>>, user 
<[email protected]<mailto:[email protected]>>
Subject: Re: Maintaining overall cumulative data in Spark Streaming

How to we reset the aggregated statistics to null?

Regards,
Sandeep Giri,
+1 347 781 4573 (US)
+91-953-899-8962 (IN)

www.KnowBigData.com.<http://KnowBigData.com.>
Phone: +1-253-397-1945 (Office)

[linkedin icon]<https://linkedin.com/company/knowbigdata>[other site 
icon]<http://knowbigdata.com> [facebook icon] 
<https://facebook.com/knowbigdata> [twitter icon] 
<https://twitter.com/IKnowBigData> <https://twitter.com/IKnowBigData>

On Fri, Oct 30, 2015 at 9:49 AM, Sandeep Giri 
<[email protected]<mailto:[email protected]>> wrote:

Yes, update state by key worked.

Though there are some more complications.

On Oct 30, 2015 8:27 AM, "skaarthik oss" 
<[email protected]<mailto:[email protected]>> wrote:
Did you consider UpdateStateByKey operation?

From: Sandeep Giri 
[mailto:[email protected]<mailto:[email protected]>]
Sent: Thursday, October 29, 2015 3:09 PM
To: user <[email protected]<mailto:[email protected]>>; dev 
<[email protected]<mailto:[email protected]>>
Subject: Maintaining overall cumulative data in Spark Streaming

Dear All,

If a continuous stream of text is coming in and you have to keep publishing the 
overall word count so far since 0:00 today, what would you do?

Publishing the results for a window is easy but if we have to keep aggregating 
the results, how to go about it?

I have tried to keep an StreamRDD with aggregated count and keep doing a 
fullouterjoin but didn't work. Seems like the StreamRDD gets reset.

Kindly help.

Regards,
Sandeep Giri

Re: Maintaining overall cumulative data in Spark Streaming

Reply via email to