Hello,

I have some question that has been bugging me.
Let's say we have a Kafka Source.
Checkpoint is enabled, with a period of 5 seconds.
We have a FSBackend ( Hadoop ).

Now imagine we have a window a tumbling of 10 Minutes.

For simplicity we are going to say that we are counting all elements
arrinving in 10 Minutes. Something like this.

class Count extends FoldFunction[Event, Long] {

   override def fold(accumulator: Long, value: Event): Long =  {
     accumulator + 1
   }

}

So we have

source.
     window(<Tumbling>).
     apply(0, Count(), WindowFunction())

In the first 2 Minutes arrives 10 events, then we stop the
stream/task/job or it fails and then it is restarted, what will be the
state of the fold function ?
Will it be 10 and it will resume from there ? Or will it be 0 ?

It is kinda important to know because imagine we have a Window of 1 day.
And the job fails mid day. How will it resume ?

Best Regards

Reply via email to