Apologies for the late reply. I think this is badly needed, but I fear we are adding complexity by introducing yet two more stop commands. We'll have: cancel, stop, terminate. and suspend. We basically want to do two things: terminate a job with prejudice or stop a job safely.
For the former "cancel" is the appropriate term, and should have no need for a cancel with checkpoint option. If the job was configured to use externalized checkpoints and it ran long enough, a checkpoint will be available for it. For the later "stop" is the appropriate term, and it means that a job should process no messages after the checkpoints barrier and that it should ensure that exactly-once sinks complete their two-phase commits successfully. If a savepoint was requested, one should be created. So in my mind there are two commands, cancel and stop, with appropriate semantics. Emitting MAX_WATERMARK before the checkpoint barrier during stop is merely an optional behavior, like creation of a savepoint. But if a specific command for it is desired, then "drain" seems appropriate. On Tue, Feb 12, 2019 at 9:50 AM Stephan Ewen <se...@apache.org> wrote: > Hi Elias! > > I remember you brought this missing feature up in the past. Do you think > the proposed enhancement would work for your use case? > > Best, > Stephan > > ---------- Forwarded message --------- > From: Kostas Kloudas <k.klou...@ververica.com> > Date: Tue, Feb 12, 2019 at 5:28 PM > Subject: [DISCUSS] FLIP-33: Terminate/Suspend Job with Savepoint > To: <dev@flink.apache.org> > > > Hi everyone, > > A commonly used functionality offered by Flink is the > "cancel-with-savepoint" operation. When applied to the current exactly-once > sinks, the current implementation of the feature can be problematic, as it > does not guarantee that side-effects will be committed by Flink to the 3rd > party storage system. > > This discussion targets fixing this issue and proposes the addition of two > termination modes, namely: > 1) SUSPEND, for temporarily stopping the job, e.g. for Flink version > upgrading in your cluster > 2) TERMINATE, for terminal shut down which ends the stream and sends > MAX_WATERMARK time, and flushes any state associated with (event time) > timers > > A google doc with the FLIP proposal can be found here: > > https://docs.google.com/document/d/1EZf6pJMvqh_HeBCaUOnhLUr9JmkhfPgn6Mre_z6tgp8/edit?usp=sharing > > And the page for the FLIP is here: > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103090212 > > The implementation sketch is far from complete, but it is worth having a > discussion on the semantics as soon as possible. The implementation section > is going to be updated soon. > > Looking forward to the discussion, > Kostas > > -- > > Kostas Kloudas | Software Engineer > > > <https://www.ververica.com/> > > Follow us @VervericaData > > -- > > Join Flink Forward <https://flink-forward.org/> - The Apache Flink > Conference > > Stream Processing | Event Driven | Real Time > > -- > > Data Artisans GmbH | Invalidenstrasse 115, 10115 Berlin, Germany > > -- > Data Artisans GmbH > Registered at Amtsgericht Charlottenburg: HRB 158244 B > Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen >