Hi,

I'm working on a custom implementation of a sink which I would like to use
with exactly once semantics. Therefore I have implemented the
TwoPhaseCommitSinkFunction class as mentioned in this recent post:
https://flink.apache.org/features/2018/03/01/end-to-end-exactly-once-apache-flink.html

I have some integration tests which run jobs using the custom sink with a
finite dataset (A RichSourceFunction with a "finite" run method). The tests
fail because of missing data. I noticed that is due to the last transaction
being aborted.

When looking into the source code that makes sense because the close()
implementation of TwoPhaseCommitSinkFunction calls abort on the current
transaction:
https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/sink/TwoPhaseCommitSinkFunction.java


I could override this behaviour and perform a commit, but then I would
perform a commit without getting the checkpoint completed notification,
thus not properly maintaining exactly once guarantees

Is (and how is) it possible to have end-to-end exactly once guarantees when
dealing with (sometimes) finite jobs?

Thanks!
Niels

Reply via email to