Dear Flink team,
I have some ambiguity when it comes to Flink's exactly-once guaranteeing.
1. Based on what I understand, when a failure occurs, some events will be
replayed which causes them to appear twice in the computations. I cannot
realize how the two-phase commit protocol can avoid this problem and
guarantee exactly-once. I expected to find a description of a mechanism for
detecting and ignoring duplicate events in the documentation, although I
got the two-phase commit protocol issuing something utterly different.
2. Regarding the event-time processing and watermarking, I have got that if
an event will be received late, after the allowed lateness time, it will be
dropped even though I think it is an antithesis of exactly-once semantic.
I will be thankful if I receive your valuable description so that I can
remove my ambiguities.
Yours faithfully,