Hi Amara, how are you validating if you have duplicates in your output or not?
If you are just writing the output to another Kafka topic or print it to standard out, you'll see duplicates even if exactly once works. Flink does not provide exactly once delivery. Flink has exactly once semantics for registered state. This means you need to cooperate with the system to achieve exactly once. For example for files, you need to remove invalid data from previous failed checkpoints. Our bucketing sink is doing that. On Tue, May 30, 2017 at 9:01 AM, F.Amara <fath...@wso2.com> wrote: > Hi Gordan, > > Thanks alot for the reply. > The events are produced using a KafkaProducer, submitted to a topic and > thereby consumed by the Flink application using a FlinkKafkaConsumer. I > verified that during a failure recovery scenario(of the Flink application) > the KafkaProducer was not interrupted, resulting in not sending duplicated > values from the data source. I observed the output from the > FlinkKafkaConsumer and noticed duplicates starting from that point onwards. > Is the FlinkKafkaConsumer capable of intoducing duplicates? > > How can I implement exactly-once processing for my application? Could you > please guide me on what I might have missed? > > Thanks, > Amara > > > > > -- > View this message in context: http://apache-flink-user- > mailing-list-archive.2336050.n4.nabble.com/Duplicated-data- > when-using-Externalized-Checkpoints-in-a-Flink-Highly-Available-cluster- > tp13301p13379.html > Sent from the Apache Flink User Mailing List archive. mailing list archive > at Nabble.com. >