Re: Suggested Method for Execution of Periodic Actions

Adrian Tanase Wed, 16 Sep 2015 18:34:47 -0700

The window can be larger, the batch/slide interval has to be smaller (assuming 
every 5-10 secs?).
You have a separate parameter on most default functions and you can override it 
as long as it's a multiple of streaming context batch interval.

Sent from my iPhone

On 16 Sep 2015, at 23:30, Ted Yu 
<yuzhih...@gmail.com<mailto:yuzhih...@gmail.com>> wrote:

bq. and check if 5 minutes have passed

What if the duration for the window is longer than 5 minutes ?

Cheers

On Wed, Sep 16, 2015 at 1:25 PM, Adrian Tanase 
<atan...@adobe.com<mailto:atan...@adobe.com>> wrote:
If you don't need the counts in betweem the DB writes, you could simply use a 5 
min window for the updateStateByKey and use foreachRdd on the resulting DStream.

Even simpler, you could use reduceByKeyAndWindow directly.

Lastly, you could keep a variable on the driver and check if 5 minutes have 
passed
in foreachRdd on the original DStream, even if the batch duration is shorter.

Also, remember to cleanup the state in your updateStateByKey function or it 
will grow unbounded. I still believe one of the builtin ByKey functions are a 
simpler strategy.

hope this helps.

-adrian

Sent from my iPhone

> On 16 Sep 2015, at 22:33, Bryan Jeffrey 
> <bryan.jeff...@gmail.com<mailto:bryan.jeff...@gmail.com>> wrote:
>
> Hello.
>
> I have a streaming job that is processing data.  I process a stream of 
> events, taking actions when I see anomalous events.  I also keep a count 
> events observed using updateStateByKey to maintain a map of type to count.  I 
> would like to periodically (every 5 minutes) write the results of my counts 
> to a database.  Is there a built in mechanism or established pattern to 
> execute periodic jobs in spark streaming?
>
> Regards,
>
> Bryan Jeffrey

---------------------------------------------------------------------
To unsubscribe, e-mail: 
user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org>
For additional commands, e-mail: 
user-h...@spark.apache.org<mailto:user-h...@spark.apache.org>

Re: Suggested Method for Execution of Periodic Actions

Reply via email to