Hi Daniel,
Flink will checkpoint the state of all operations (in your case to HDFS).
Flink has several APIs for dealing with state in user functions:
https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/state.html The
window operator also internally uses these APIs.

Let me know if you need anything more.

Cheers,
Aljoscha

On Thu, 3 Nov 2016 at 19:43 Daniel Santos <dsan...@cryptolab.net> wrote:

> Hello,
>
> I have some question that has been bugging me.
> Let's say we have a Kafka Source.
> Checkpoint is enabled, with a period of 5 seconds.
> We have a FSBackend ( Hadoop ).
>
> Now imagine we have a window a tumbling of 10 Minutes.
>
> For simplicity we are going to say that we are counting all elements
> arrinving in 10 Minutes. Something like this.
>
> class Count extends FoldFunction[Event, Long] {
>
>     override def fold(accumulator: Long, value: Event): Long =  {
>       accumulator + 1
>     }
>
> }
>
> So we have
>
> source.
>       window(<Tumbling>).
>       apply(0, Count(), WindowFunction())
>
> In the first 2 Minutes arrives 10 events, then we stop the
> stream/task/job or it fails and then it is restarted, what will be the
> state of the fold function ?
> Will it be 10 and it will resume from there ? Or will it be 0 ?
>
> It is kinda important to know because imagine we have a Window of 1 day.
> And the job fails mid day. How will it resume ?
>
> Best Regards
>
>

Reply via email to