You can use a tuple associating a timestamp to your running sum; and have
COMPUTE_RUNNING_SUM to reset the running sum to zero when the timestamp is
more than 5 minutes old.
You'll still have a leak doing so if your keys keep changing, though.
--Christophe
2014-08-29 9:00 GMT-07:00 Eko Susilo :
so the "codes" currently holding RDD containing codes and its respective
counter. I would like to find a way to reset those RDD after some period of
time.
On Fri, Aug 29, 2014 at 5:55 PM, Sean Owen wrote:
> "codes" is a DStream, not an RDD. The remember() method controls how
> long Spark Stream
"codes" is a DStream, not an RDD. The remember() method controls how
long Spark Streaming holds on to the RDDs itself. Clarify what you
mean by "reset"? codes provides a stream of RDDs that contain your
computation over a window of time. New RDDs come with the computation
over new data.
On Fri, Au
Hi all,
I would like to ask some advice about resetting spark stateful operation.
so i tried like this:
JavaStreamingContext jssc = new JavaStreamingContext(context, new
Duration(5000));
jssc.remember(Duration(5*60*1000));
jssc.checkpoint(ApplicationConstants.HDFS_STREAM_DIRECTORIES);
JavaPairRec