Re: Exact-once processing when a job fails

2022-01-06 Thread Sharon Xie
Thanks again the for the detailed info. It makes a lot of sense. One last question, can me create a checkpoint as soon as a job starts? In this case, the first record’s offset will be in the checkpoint state and this will provide “strong” guarantee as you said. Did I miss anything? I read the code

Re: Exact-once processing when a job fails

2022-01-05 Thread Caizhi Weng
Hi! Also note that although this eventual consistency seems not good enough, but for 99.99% of the time the job can run smoothly without failure. In this case the records are correct and good. Only in the 0.01% case when the job fails will user see inconsistency for a small period of time (for a c

Re: Exact-once processing when a job fails

2022-01-05 Thread Caizhi Weng
Hi! Flink guarantees *eventual* consistency for systems without transactions (by transaction I mean a system supporting writing a few records then commit), or with transactions but users prefer latency than consistency. That is to say, everything produced by Flink before a checkpoint is "not secur

Re: Exact-once processing when a job fails

2022-01-04 Thread Sharon Xie
Hi Caizhi, Thank you for the quick response. Can you help me understand how reprocessing the data with the earliest starting-offset ensures exactly once processing? 1st, the earliest offset could be way beyond the 1st record in my example since the first time the job started from the latest offset

Re: Exact-once processing when a job fails

2022-01-04 Thread Caizhi Weng
Hi! This is a valid case. This starting-offset is the offset for Kafka source to read from when the job starts *without checkpoint*. That is to say, if your job has been running for a while, completed several checkpoints and then restarted, Kafka source won't read from starting-offset, but from th

Exact-once processing when a job fails

2022-01-04 Thread Sharon Xie
Can someone help me understand how Flink deals with the following scenario? I have a job that reads from a source Kafka (starting-offset: latest) and writes to a sink Kafka with exactly-once execution. Let's say that I have 2 records in the source. The 1st one is processed without issue and the jo