Re: [DISCUSS] Canceling Streaming Jobs

Stephan Ewen Wed, 27 May 2015 03:37:06 -0700

+1 for the second option.

How about we allow to pass a flag that indicates whether a checkpoint
should be taken together with the canceling?



On Wed, May 27, 2015 at 12:27 PM, Aljoscha Krettek <aljos...@apache.org>
wrote:

> I would also prefer the second option. The first is rather a hack but not
> an option. :D
> On May 27, 2015 9:14 AM, "Márton Balassi" <balassi.mar...@gmail.com>
> wrote:
>
> > +1 for the second option:
> >
> > It would also provide possibility to properly commit a state checkpoint
> > after the terminate message was triggered. In some cases this can be a
> > desirable behaviour.
> >
> > On Wed, May 27, 2015 at 8:46 AM, Gyula Fóra <gyf...@apache.org> wrote:
> >
> > > Hey,
> > >
> > > I would also strongly prefer the second option, users need to have the
> > > option to force cancel a program in case of something unwanted
> behaviour.
> > >
> > > Cheers,
> > > Gyula
> > >
> > > Matthias J. Sax <mj...@informatik.hu-berlin.de> ezt írta (időpont:
> 2015.
> > > máj. 27., Sze, 1:20):
> > >
> > > > Hi,
> > > >
> > > > currently, the only way to stop a streaming job is to "cancel" the
> job,
> > > > This has multiple disadvantage:
> > > >  1) a "clean" stopping is not possible (see
> > > > https://issues.apache.org/jira/browse/FLINK-1929 -- I think a clean
> > stop
> > > > is a pre-requirement for FLINK-1929) and
> > > >  2) as a minor issue, all canceled jobs are listed as canceled in the
> > > > history (what is somewhat confusing for the user -- at least it was
> for
> > > > me when I started to work with Flink Streaming).
> > > >
> > > > This issue was raised a few times already, however, no final
> conclusion
> > > > was there (if I remember correctly). I could not find a JIRA for it
> > > either.
> > > >
> > > > From my understanding of the system, there would be two ways to
> > > > implement a nice way for stopping streaming jobs:
> > > >
> > > >   1) "Task"s can be distinguished between "batch" and "streaming"
> > > >      -> canceling a batch jobs works as always
> > > >      -> canceling a streaming job only send a "canceling" signal to
> the
> > > > sources, and waits until the job finishes (ie, sources stop emitting
> > > > data and finish regularly, triggering the finishing of all
> operators).
> > > > For this case, streaming jobs are stopped in a "clean way" (as is the
> > > > input would have be finite) and the job will be listed as "finished"
> in
> > > > the history regularly.
> > > >
> > > >   This approach has the advantage, that it should be simpler to
> > > > implement. However, the disadvantages are (1) a "hard canceling" of
> > jobs
> > > > is not possible any more, and (2) Flink must be able to distinguishes
> > > > batch and streaming jobs (I don't think Flink runtime can distinguish
> > > > both right now?)
> > > >
> > > >   2) A new message "terminate" (or similar) is introduced, that can
> > only
> > > > be used for streaming jobs (would be ignored for batch jobs) that
> stops
> > > > the sources and waits until the job finishes regularly.
> > > >
> > > >   This approach has the advantage, that current system behavior is
> > > > preserved (it only adds a few feature). The disadvantage is, that all
> > > > clients need to be touched and it must be clear to the user, that
> > > > "terminate" does not work for streaming jobs. If an error/warning
> > should
> > > > be raised if a user tries to "terminate" a batch job, Flink must be
> > able
> > > > to distinguish between batch and streaming jobs, too.  As an
> > > > alternative, "terminate" on batch jobs could be interpreted as
> > "cancel",
> > > > too.
> > > >
> > > >
> > > > I personally think, that the second approach is better. Please give
> > > > feedback. If we can get to a conclusion how to implement it, I would
> > > > like to work on it.
> > > >
> > > >
> > > > -Matthias
> > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Canceling Streaming Jobs

Reply via email to