Yes you are correct Dominik. The committed Kafka offsets tell you what the program has read as input from the Kafka topic. But depending on the actual program logic this does not mean that you have output the results of processing these input events up to this point. As you have said, there are Flink operations such as window calculations which need to buffer events for a certain period before they can emit the results.
Cheers, Till On Mon, Sep 9, 2019 at 4:54 AM Paul Lam <paullin3...@gmail.com> wrote: > Hi Dom, > > There are sync phase and async phase in checkpointing. When a operator > receives a barrier, it performs snapshot aka the sync phase. And when the > barriers pass through all the operators including sinks, the operators will > get a notification, after which they do the async part, like committing the > Kafka offsets. WRT your question, the offsets would only be committed when > the whole checkpoint is successfully finished. For more information, you > can refer to this post[1]. > > [1] > https://flink.apache.org/features/2018/03/01/end-to-end-exactly-once-apache-flink.html > < > https://flink.apache.org/features/2018/03/01/end-to-end-exactly-once-apache-flink.html > > > > Best, > Paul Lam > > > 在 2019年9月9日,07:06,Dominik Wosiński <wos...@gmail.com> 写道: > > > > Okay, thanks for clarifying. I have some followup question here. If we > > consider Kafka offsets commits, this basically means that > > the offsets committed during the checkpoint are not necessarily the > > offsets that were really processed by the pipeline and written to sink ? > I > > mean If there is a window in the pipeline, then the records are saved in > > the window state if the window was not emitted yet, but they are > considered > > as processed, thus will not be replayed in case of restart, because Flink > > considers them as processed when committing offsets. Am I correct ? > > > > Best, > > Dom. > >