That would be good indeed. I just learned about it from Stephan mentioned. It sounds correct to me along the lines but it would be nice to see the details.
> On 05 Feb 2016, at 13:32, Ufuk Celebi <u...@apache.org> wrote: > > >> On 05 Feb 2016, at 13:28, Paris Carbone <par...@kth.se> wrote: >> >> Hi Gabor, >> >> The sinks should aware that the global checkpoint is indeed persisted before >> emitting so they will have to wait until they are notified for its >> completion before pushing to Kafka. The current view of the local state is >> not the actual persisted view so checking against is like relying on dirty >> state. Imagine the following scenario: >> >> 1) sink pushes to kafka record k and updates local buffer for k >> 2) sink snapshots k and the rest of its state on checkpoint barrier >> 3) global checkpoint fails due to some reason (e.g. another sink subtask >> failed) and the job gets restarted >> 4) sink pushes again record k to kafka since the last global snapshots did >> not complete before and k is not in the local buffer >> >> Chesnay’s approach seems to be valid and best effort for the time being. > > Chesnay’s approach is not part of this thread. Can you or Chesnay > elaborate/provide a link? > > – Ufuk >