Thank you! I already have a custom source function so adding the hacky solution would not be too much effort.
Looking forward to the "proper" solution! Niels On Fri, Mar 9, 2018, 16:00 Piotr Nowojski <pi...@data-artisans.com> wrote: > Hi, > > Short answer is: no, at the moment clean shutdown is not implemented for > the streaming, but it’s on our to do list for the future. > > Hacky answer: you could implement some custom code, that would wait for at > least one completed checkpoint after the last input data. But that would > require modifying a source function or at least wrapping it and there might > be some corner cases that I haven’t thought about. > > Piotrek > > > On 9 Mar 2018, at 14:49, Niels van Kaam <ni...@vankaam.net> wrote: > > Hi, > > I'm working on a custom implementation of a sink which I would like to use > with exactly once semantics. Therefore I have implemented the > TwoPhaseCommitSinkFunction class as mentioned in this recent post: > https://flink.apache.org/features/2018/03/01/end-to-end-exactly-once-apache-flink.html > > I have some integration tests which run jobs using the custom sink with a > finite dataset (A RichSourceFunction with a "finite" run method). The tests > fail because of missing data. I noticed that is due to the last transaction > being aborted. > > When looking into the source code that makes sense because the close() > implementation of TwoPhaseCommitSinkFunction calls abort on the current > transaction: > https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/sink/TwoPhaseCommitSinkFunction.java > > > I could override this behaviour and perform a commit, but then I would > perform a commit without getting the checkpoint completed notification, > thus not properly maintaining exactly once guarantees > > Is (and how is) it possible to have end-to-end exactly once guarantees > when dealing with (sometimes) finite jobs? > > Thanks! > Niels > > >