+1 for the second option: It would also provide possibility to properly commit a state checkpoint after the terminate message was triggered. In some cases this can be a desirable behaviour.
On Wed, May 27, 2015 at 8:46 AM, Gyula Fóra <gyf...@apache.org> wrote: > Hey, > > I would also strongly prefer the second option, users need to have the > option to force cancel a program in case of something unwanted behaviour. > > Cheers, > Gyula > > Matthias J. Sax <mj...@informatik.hu-berlin.de> ezt írta (időpont: 2015. > máj. 27., Sze, 1:20): > > > Hi, > > > > currently, the only way to stop a streaming job is to "cancel" the job, > > This has multiple disadvantage: > > 1) a "clean" stopping is not possible (see > > https://issues.apache.org/jira/browse/FLINK-1929 -- I think a clean stop > > is a pre-requirement for FLINK-1929) and > > 2) as a minor issue, all canceled jobs are listed as canceled in the > > history (what is somewhat confusing for the user -- at least it was for > > me when I started to work with Flink Streaming). > > > > This issue was raised a few times already, however, no final conclusion > > was there (if I remember correctly). I could not find a JIRA for it > either. > > > > From my understanding of the system, there would be two ways to > > implement a nice way for stopping streaming jobs: > > > > 1) "Task"s can be distinguished between "batch" and "streaming" > > -> canceling a batch jobs works as always > > -> canceling a streaming job only send a "canceling" signal to the > > sources, and waits until the job finishes (ie, sources stop emitting > > data and finish regularly, triggering the finishing of all operators). > > For this case, streaming jobs are stopped in a "clean way" (as is the > > input would have be finite) and the job will be listed as "finished" in > > the history regularly. > > > > This approach has the advantage, that it should be simpler to > > implement. However, the disadvantages are (1) a "hard canceling" of jobs > > is not possible any more, and (2) Flink must be able to distinguishes > > batch and streaming jobs (I don't think Flink runtime can distinguish > > both right now?) > > > > 2) A new message "terminate" (or similar) is introduced, that can only > > be used for streaming jobs (would be ignored for batch jobs) that stops > > the sources and waits until the job finishes regularly. > > > > This approach has the advantage, that current system behavior is > > preserved (it only adds a few feature). The disadvantage is, that all > > clients need to be touched and it must be clear to the user, that > > "terminate" does not work for streaming jobs. If an error/warning should > > be raised if a user tries to "terminate" a batch job, Flink must be able > > to distinguish between batch and streaming jobs, too. As an > > alternative, "terminate" on batch jobs could be interpreted as "cancel", > > too. > > > > > > I personally think, that the second approach is better. Please give > > feedback. If we can get to a conclusion how to implement it, I would > > like to work on it. > > > > > > -Matthias > > > > >