I agree with Till that we should not change the semantics of per-job mode. In my opinion per-job mode means that the cluster (JobManager) is brought up with one job and it only executes that one job. There should be no open ports/anything that would allow submitting further jobs. This is very important for deployments in docker/Kubernetes or other environments were you bring up jobs without necessarily having the notion of a Flink cluster.
What this means for a user program that has multiple execute() calls is that you will get a fresh cluster for each execute call. This also means, that further execute() calls will only happen if the “client” is still alive, because it is the one driving execution. Currently, this only works if you start the job in “attached” mode. If you start in “detached” mode only the first execute() will happen and the rest will be ignored. This brings us to the tricky question about what to do about “detached” and “attached”. In the long run, I would like to get rid of the distinction and leave it up to the user program, by either blocking or not on the Future (or JobClient or whatnot) that job submission returns. This, however, means that users cannot simply request “detached” execution when using bin/flink, the user program has to “play along”. On the other hand, “detached” mode is quite strange for the user program. The execute() call either returns with a proper job result after the job ran (in “attached” mode) or with a dummy result (in “detached” mode) right after submission. I think this can even lead to weird cases where multiple "execute()” run in parallel. For per-job detached mode we also “throw” out of the first execute so the rest (including result processing logic) is ignored. For this here FLIP-73 we can (and should) ignore these problems, because FLIP-73 only moves the existing submission logic behind a reusable abstraction and makes it usable via API. We should closely follow up on the above points though because I think they are also important. Best, Aljoscha > On 2. Oct 2019, at 12:08, Zili Chen <wander4...@gmail.com> wrote: > > Thanks for your clarification Till. > > I agree with the current semantics of the per-job mode, one should deploy a > new cluster for each part of the job. Apart from the performance concern > it also means that PerJobExecutor knows how to deploy a cluster actually, > which is different from the description that Executor submit a job. > > Anyway it sounds workable and narrow the changes.