Apologies for the late reply.

I think this is badly needed, but I fear we are adding complexity by
introducing yet two more stop commands.  We'll have: cancel, stop,
terminate. and suspend.  We basically want to do two things: terminate a
job with prejudice or stop a job safely.

For the former "cancel" is the appropriate term, and should have no need
for a cancel with checkpoint option.  If the job was configured to use
externalized checkpoints and it ran long enough, a checkpoint will be
available for it.

For the later "stop" is the appropriate term, and it means that a job
should process no messages after the checkpoints barrier and that it should
ensure that exactly-once sinks complete their two-phase commits
successfully.  If a savepoint was requested, one should be created.

So in my mind there are two commands, cancel and stop, with appropriate
semantics.  Emitting MAX_WATERMARK before the checkpoint barrier during
stop is merely an optional behavior, like creation of a savepoint.  But if
a specific command for it is desired, then "drain" seems appropriate.

On Tue, Feb 12, 2019 at 9:50 AM Stephan Ewen <se...@apache.org> wrote:

> Hi Elias!
>
> I remember you brought this missing feature up in the past. Do you think
> the proposed enhancement would work for your use case?
>
> Best,
> Stephan
>
> ---------- Forwarded message ---------
> From: Kostas Kloudas <k.klou...@ververica.com>
> Date: Tue, Feb 12, 2019 at 5:28 PM
> Subject: [DISCUSS] FLIP-33: Terminate/Suspend Job with Savepoint
> To: <dev@flink.apache.org>
>
>
> Hi everyone,
>
>  A commonly used functionality offered by Flink is the
> "cancel-with-savepoint" operation. When applied to the current exactly-once
> sinks, the current implementation of the feature can be problematic, as it
> does not guarantee that side-effects will be committed by Flink to the 3rd
> party storage system.
>
>  This discussion targets fixing this issue and proposes the addition of two
> termination modes, namely:
>     1) SUSPEND, for temporarily stopping the job, e.g. for Flink version
> upgrading in your cluster
>     2) TERMINATE, for terminal shut down which ends the stream and sends
> MAX_WATERMARK time, and flushes any state associated with (event time)
> timers
>
> A google doc with the FLIP proposal can be found here:
>
> https://docs.google.com/document/d/1EZf6pJMvqh_HeBCaUOnhLUr9JmkhfPgn6Mre_z6tgp8/edit?usp=sharing
>
> And the page for the FLIP is here:
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103090212
>
>  The implementation sketch is far from complete, but it is worth having a
> discussion on the semantics as soon as possible. The implementation section
> is going to be updated soon.
>
>  Looking forward to the discussion,
>  Kostas
>
> --
>
> Kostas Kloudas | Software Engineer
>
>
> <https://www.ververica.com/>
>
> Follow us @VervericaData
>
> --
>
> Join Flink Forward <https://flink-forward.org/> - The Apache Flink
> Conference
>
> Stream Processing | Event Driven | Real Time
>
> --
>
> Data Artisans GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>
> --
> Data Artisans GmbH
> Registered at Amtsgericht Charlottenburg: HRB 158244 B
> Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen
>

Reply via email to