If I were to set the window duration to 60 seconds, while having a batch
interval equal to a second, and a slide duration of 59 seconds I would get the
desired behaviour.
However, would the Receiver pull messages from Kafka only at the 59th second
slide interval or it would constantly pull the
Correct - that is the part that I understood nicely.
However, what alternative transformation might I apply to iterate through the
RDDs considering a window duration of 60 seconds which I cannot change?
> On 17 Mar 2017, at 16:57, Cody Koeninger wrote:
>
> Probably easier if you show some m
Probably easier if you show some more code, but if you just call
dstream.window(Seconds(60))
you didn't specify a slide duration, so it's going to default to your
batch duration of 1 second.
So yeah, if you're just using e.g. foreachRDD to output every message
in the window, every second it's going
Have you considered trying event time aggregation in structured streaming
instead?
On Thu, Mar 16, 2017 at 12:34 PM, Dominik Safaric
wrote:
> Hi all,
>
> As I’ve implemented a streaming application pulling data from Kafka every
> 1 second (batch interval), I am observing some quite strange behav
Hi all,
As I’ve implemented a streaming application pulling data from Kafka every 1
second (batch interval), I am observing some quite strange behaviour (didn’t
use Spark extensively in the past, but continuous operator based engines
instead of).
Namely the dstream.window(Seconds(60)) windowe