Re: Spark Streaming reset state

2014-08-29 Thread Christophe Sebastien
You can use a tuple associating a timestamp to your running sum; and have COMPUTE_RUNNING_SUM to reset the running sum to zero when the timestamp is more than 5 minutes old. You'll still have a leak doing so if your keys keep changing, though. --Christophe 2014-08-29 9:00 GMT-07:00 Eko Susilo :

Re: Spark Streaming reset state

2014-08-29 Thread Eko Susilo
so the "codes" currently holding RDD containing codes and its respective counter. I would like to find a way to reset those RDD after some period of time. On Fri, Aug 29, 2014 at 5:55 PM, Sean Owen wrote: > "codes" is a DStream, not an RDD. The remember() method controls how > long Spark Stream

Re: Spark Streaming reset state

2014-08-29 Thread Sean Owen
"codes" is a DStream, not an RDD. The remember() method controls how long Spark Streaming holds on to the RDDs itself. Clarify what you mean by "reset"? codes provides a stream of RDDs that contain your computation over a window of time. New RDDs come with the computation over new data. On Fri, Au

Spark Streaming reset state

2014-08-29 Thread Eko Susilo
Hi all, I would like to ask some advice about resetting spark stateful operation. so i tried like this: JavaStreamingContext jssc = new JavaStreamingContext(context, new Duration(5000)); jssc.remember(Duration(5*60*1000)); jssc.checkpoint(ApplicationConstants.HDFS_STREAM_DIRECTORIES); JavaPairRec