I’ve been investigating a data duplication issue in a Kinesis -> Flink -> Kafka exactly once setup.
Found out that at the stop with savepoint next things happen: * The Kafka transaction is committed, the last processed events being written * The Kinesis sequence number is written in the savepoint _metadata file. The problem is that Kinesis connector does not write to the savepoint file the sequence number corresponding to the last event written into Kafka! Instead, it writes the same sequence number as the one from the last checkpoint. That way, after resuming the job from the savepoint, it duplicates all the records between the last checkpoint and the savepoint (were committed to Kafka at the savepoint). I believe that’s a kinesis connector issue because I tried the Kafka -> Flink -> Kafka setup and the erroneous behavior does not reproduce. Have anybody reproduces such a situation?