Hi, currently, the only way to stop a streaming job is to "cancel" the job, This has multiple disadvantage: 1) a "clean" stopping is not possible (see https://issues.apache.org/jira/browse/FLINK-1929 -- I think a clean stop is a pre-requirement for FLINK-1929) and 2) as a minor issue, all canceled jobs are listed as canceled in the history (what is somewhat confusing for the user -- at least it was for me when I started to work with Flink Streaming).
This issue was raised a few times already, however, no final conclusion was there (if I remember correctly). I could not find a JIRA for it either. From my understanding of the system, there would be two ways to implement a nice way for stopping streaming jobs: 1) "Task"s can be distinguished between "batch" and "streaming" -> canceling a batch jobs works as always -> canceling a streaming job only send a "canceling" signal to the sources, and waits until the job finishes (ie, sources stop emitting data and finish regularly, triggering the finishing of all operators). For this case, streaming jobs are stopped in a "clean way" (as is the input would have be finite) and the job will be listed as "finished" in the history regularly. This approach has the advantage, that it should be simpler to implement. However, the disadvantages are (1) a "hard canceling" of jobs is not possible any more, and (2) Flink must be able to distinguishes batch and streaming jobs (I don't think Flink runtime can distinguish both right now?) 2) A new message "terminate" (or similar) is introduced, that can only be used for streaming jobs (would be ignored for batch jobs) that stops the sources and waits until the job finishes regularly. This approach has the advantage, that current system behavior is preserved (it only adds a few feature). The disadvantage is, that all clients need to be touched and it must be clear to the user, that "terminate" does not work for streaming jobs. If an error/warning should be raised if a user tries to "terminate" a batch job, Flink must be able to distinguish between batch and streaming jobs, too. As an alternative, "terminate" on batch jobs could be interpreted as "cancel", too. I personally think, that the second approach is better. Please give feedback. If we can get to a conclusion how to implement it, I would like to work on it. -Matthias
signature.asc
Description: OpenPGP digital signature