Hi Daniel, Flink will checkpoint the state of all operations (in your case to HDFS). Flink has several APIs for dealing with state in user functions: https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/state.html The window operator also internally uses these APIs.
Let me know if you need anything more. Cheers, Aljoscha On Thu, 3 Nov 2016 at 19:43 Daniel Santos <dsan...@cryptolab.net> wrote: > Hello, > > I have some question that has been bugging me. > Let's say we have a Kafka Source. > Checkpoint is enabled, with a period of 5 seconds. > We have a FSBackend ( Hadoop ). > > Now imagine we have a window a tumbling of 10 Minutes. > > For simplicity we are going to say that we are counting all elements > arrinving in 10 Minutes. Something like this. > > class Count extends FoldFunction[Event, Long] { > > override def fold(accumulator: Long, value: Event): Long = { > accumulator + 1 > } > > } > > So we have > > source. > window(<Tumbling>). > apply(0, Count(), WindowFunction()) > > In the first 2 Minutes arrives 10 events, then we stop the > stream/task/job or it fails and then it is restarted, what will be the > state of the fold function ? > Will it be 10 and it will resume from there ? Or will it be 0 ? > > It is kinda important to know because imagine we have a Window of 1 day. > And the job fails mid day. How will it resume ? > > Best Regards > >