Hi,

we are working on a flink pipeline and running into duplicates in case of 
checkpoint failures.

The pipeline is running on Flink 1.13.2 and uses the source and sink classes 
from the flink kafka connector library.

The checkpointing is set to exactly once and we do care about correctness of 
data and not so much about throughput speed.

We observe that upon recovery Flink will reread all records from the offset 
stored in the last successful checkpoint. Thus, the same records will be 
replayed as have been generated in between last checkpoint and failure.

How can we achieve end to end exactly once guarantee in our pipeline kafka -> 
flink -> kafka and do not have duplicate records anymore and avoid data loss?

Many thanks in advance!

Patrick

--
Patrick Eifler

Senior Software Engineer (BI)

Cloud Gaming Engineering & Infrastructure
Sony Interactive Entertainment LLC

Wilhelmstraße 118, 10963 Berlin

Germany

E: patrick.eif...@sony.com

Reply via email to