Re: Streaming 2.1.0 - window vs. batch duration

2017-03-18 Thread Dominik Safaric
If I were to set the window duration to 60 seconds, while having a batch interval equal to a second, and a slide duration of 59 seconds I would get the desired behaviour. However, would the Receiver pull messages from Kafka only at the 59th second slide interval or it would constantly pull the

Re: Streaming 2.1.0 - window vs. batch duration

2017-03-18 Thread Dominik Safaric
Correct - that is the part that I understood nicely. However, what alternative transformation might I apply to iterate through the RDDs considering a window duration of 60 seconds which I cannot change? > On 17 Mar 2017, at 16:57, Cody Koeninger wrote: > > Probably easier if you show some m

Re: Streaming 2.1.0 - window vs. batch duration

2017-03-17 Thread Cody Koeninger
Probably easier if you show some more code, but if you just call dstream.window(Seconds(60)) you didn't specify a slide duration, so it's going to default to your batch duration of 1 second. So yeah, if you're just using e.g. foreachRDD to output every message in the window, every second it's going

Re: Streaming 2.1.0 - window vs. batch duration

2017-03-16 Thread Michael Armbrust
Have you considered trying event time aggregation in structured streaming instead? On Thu, Mar 16, 2017 at 12:34 PM, Dominik Safaric wrote: > Hi all, > > As I’ve implemented a streaming application pulling data from Kafka every > 1 second (batch interval), I am observing some quite strange behav

Streaming 2.1.0 - window vs. batch duration

2017-03-16 Thread Dominik Safaric
Hi all, As I’ve implemented a streaming application pulling data from Kafka every 1 second (batch interval), I am observing some quite strange behaviour (didn’t use Spark extensively in the past, but continuous operator based engines instead of). Namely the dstream.window(Seconds(60)) windowe