+1 for the second option. How about we allow to pass a flag that indicates whether a checkpoint should be taken together with the canceling?
On Wed, May 27, 2015 at 12:27 PM, Aljoscha Krettek <aljos...@apache.org> wrote: > I would also prefer the second option. The first is rather a hack but not > an option. :D > On May 27, 2015 9:14 AM, "Márton Balassi" <balassi.mar...@gmail.com> > wrote: > > > +1 for the second option: > > > > It would also provide possibility to properly commit a state checkpoint > > after the terminate message was triggered. In some cases this can be a > > desirable behaviour. > > > > On Wed, May 27, 2015 at 8:46 AM, Gyula Fóra <gyf...@apache.org> wrote: > > > > > Hey, > > > > > > I would also strongly prefer the second option, users need to have the > > > option to force cancel a program in case of something unwanted > behaviour. > > > > > > Cheers, > > > Gyula > > > > > > Matthias J. Sax <mj...@informatik.hu-berlin.de> ezt írta (időpont: > 2015. > > > máj. 27., Sze, 1:20): > > > > > > > Hi, > > > > > > > > currently, the only way to stop a streaming job is to "cancel" the > job, > > > > This has multiple disadvantage: > > > > 1) a "clean" stopping is not possible (see > > > > https://issues.apache.org/jira/browse/FLINK-1929 -- I think a clean > > stop > > > > is a pre-requirement for FLINK-1929) and > > > > 2) as a minor issue, all canceled jobs are listed as canceled in the > > > > history (what is somewhat confusing for the user -- at least it was > for > > > > me when I started to work with Flink Streaming). > > > > > > > > This issue was raised a few times already, however, no final > conclusion > > > > was there (if I remember correctly). I could not find a JIRA for it > > > either. > > > > > > > > From my understanding of the system, there would be two ways to > > > > implement a nice way for stopping streaming jobs: > > > > > > > > 1) "Task"s can be distinguished between "batch" and "streaming" > > > > -> canceling a batch jobs works as always > > > > -> canceling a streaming job only send a "canceling" signal to > the > > > > sources, and waits until the job finishes (ie, sources stop emitting > > > > data and finish regularly, triggering the finishing of all > operators). > > > > For this case, streaming jobs are stopped in a "clean way" (as is the > > > > input would have be finite) and the job will be listed as "finished" > in > > > > the history regularly. > > > > > > > > This approach has the advantage, that it should be simpler to > > > > implement. However, the disadvantages are (1) a "hard canceling" of > > jobs > > > > is not possible any more, and (2) Flink must be able to distinguishes > > > > batch and streaming jobs (I don't think Flink runtime can distinguish > > > > both right now?) > > > > > > > > 2) A new message "terminate" (or similar) is introduced, that can > > only > > > > be used for streaming jobs (would be ignored for batch jobs) that > stops > > > > the sources and waits until the job finishes regularly. > > > > > > > > This approach has the advantage, that current system behavior is > > > > preserved (it only adds a few feature). The disadvantage is, that all > > > > clients need to be touched and it must be clear to the user, that > > > > "terminate" does not work for streaming jobs. If an error/warning > > should > > > > be raised if a user tries to "terminate" a batch job, Flink must be > > able > > > > to distinguish between batch and streaming jobs, too. As an > > > > alternative, "terminate" on batch jobs could be interpreted as > > "cancel", > > > > too. > > > > > > > > > > > > I personally think, that the second approach is better. Please give > > > > feedback. If we can get to a conclusion how to implement it, I would > > > > like to work on it. > > > > > > > > > > > > -Matthias > > > > > > > > > > > > > >