> On 05 Feb 2016, at 13:28, Paris Carbone <par...@kth.se> wrote: > > Hi Gabor, > > The sinks should aware that the global checkpoint is indeed persisted before > emitting so they will have to wait until they are notified for its completion > before pushing to Kafka. The current view of the local state is not the > actual persisted view so checking against is like relying on dirty state. > Imagine the following scenario: > > 1) sink pushes to kafka record k and updates local buffer for k > 2) sink snapshots k and the rest of its state on checkpoint barrier > 3) global checkpoint fails due to some reason (e.g. another sink subtask > failed) and the job gets restarted > 4) sink pushes again record k to kafka since the last global snapshots did > not complete before and k is not in the local buffer > > Chesnay’s approach seems to be valid and best effort for the time being.
Chesnay’s approach is not part of this thread. Can you or Chesnay elaborate/provide a link? – Ufuk