Hi Amara, please refer to [1] for some details about our checkpointing mechanism, in short, for your situation:
* checkpoints are made at certain checkpoint barriers, * in between those barriers, processing continues and so do outputs * in case of a failure the state at the latest checkpoint is restored * then the processing re-starts from there and you will see the same outputs again You seem to not deliver to Kafka but only consume from it and write to a csv file. If this operation was transactional, you would commit at each checkpoint barrier only and never see the "duplicate", i.e. uncommitted events. Regards, Nico [1] https://ci.apache.org/projects/flink/flink-docs-release-1.3/internals/ stream_checkpointing.html On Monday, 5 June 2017 08:55:05 CEST F.Amara wrote: > Hi Robert, > > I have few more questions to clarify. > > 1) Why do you say printing the values to the standard out would display > duplicates even if exactly once works? What is the reason for this? Could > you brief me on this? > > 2) I observed duplicates (by writing to a file) starting from the > FlinkKafkaConsumer onwards. Why does this component introduce duplicates? Is > it because Kafka guarantees only At-least once delivery at the moment? > > Thanks, > Amara > > > > -- > View this message in context: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Duplica > ted-data-when-using-Externalized-Checkpoints-in-a-Flink-Highly-Available-clu > ster-tp13301p13483.html Sent from the Apache Flink User Mailing List > archive. mailing list archive at Nabble.com.
signature.asc
Description: This is a digitally signed message part.