Re: How to ensure exactly-once semantics in output to Kafka?

Ufuk Celebi Fri, 05 Feb 2016 04:32:58 -0800

> On 05 Feb 2016, at 13:28, Paris Carbone <par...@kth.se> wrote:
> 
> Hi Gabor,
> 
> The sinks should aware that the global checkpoint is indeed persisted before 
> emitting so they will have to wait until they are notified for its completion 
> before pushing to Kafka. The current view of the local state is not the 
> actual persisted view so checking against is like relying on dirty state. 
> Imagine the following scenario:
> 
> 1) sink pushes to kafka record k and updates local buffer for k
> 2) sink snapshots k and the rest of its state on checkpoint barrier
> 3) global checkpoint fails due to some reason (e.g. another sink subtask 
> failed) and the job gets restarted
> 4) sink pushes again record k to kafka since the last global snapshots did 
> not complete before and k is not in the local buffer
> 
> Chesnay’s approach seems to be valid and best effort for the time being.


Chesnay’s approach is not part of this thread. Can you or Chesnay 
elaborate/provide a link?

– Ufuk

Re: How to ensure exactly-once semantics in output to Kafka?

Reply via email to