The window can be larger, the batch/slide interval has to be smaller (assuming every 5-10 secs?). You have a separate parameter on most default functions and you can override it as long as it's a multiple of streaming context batch interval.
Sent from my iPhone On 16 Sep 2015, at 23:30, Ted Yu <yuzhih...@gmail.com<mailto:yuzhih...@gmail.com>> wrote: bq. and check if 5 minutes have passed What if the duration for the window is longer than 5 minutes ? Cheers On Wed, Sep 16, 2015 at 1:25 PM, Adrian Tanase <atan...@adobe.com<mailto:atan...@adobe.com>> wrote: If you don't need the counts in betweem the DB writes, you could simply use a 5 min window for the updateStateByKey and use foreachRdd on the resulting DStream. Even simpler, you could use reduceByKeyAndWindow directly. Lastly, you could keep a variable on the driver and check if 5 minutes have passed in foreachRdd on the original DStream, even if the batch duration is shorter. Also, remember to cleanup the state in your updateStateByKey function or it will grow unbounded. I still believe one of the builtin ByKey functions are a simpler strategy. hope this helps. -adrian Sent from my iPhone > On 16 Sep 2015, at 22:33, Bryan Jeffrey > <bryan.jeff...@gmail.com<mailto:bryan.jeff...@gmail.com>> wrote: > > Hello. > > I have a streaming job that is processing data. I process a stream of > events, taking actions when I see anomalous events. I also keep a count > events observed using updateStateByKey to maintain a map of type to count. I > would like to periodically (every 5 minutes) write the results of my counts > to a database. Is there a built in mechanism or established pattern to > execute periodic jobs in spark streaming? > > Regards, > > Bryan Jeffrey --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org> For additional commands, e-mail: user-h...@spark.apache.org<mailto:user-h...@spark.apache.org>