I agree with Till that we should not change the semantics of per-job mode. In 
my opinion per-job mode means that the cluster (JobManager) is brought up with 
one job and it only executes that one job. There should be no open 
ports/anything that would allow submitting further jobs. This is very important 
for deployments in docker/Kubernetes or other environments were you bring up 
jobs without necessarily having the notion of a Flink cluster.

What this means for a user program that has multiple execute() calls is that 
you will get a fresh cluster for each execute call. This also means, that 
further execute() calls will only happen if the “client” is still alive, 
because it is the one driving execution. Currently, this only works if you 
start the job in “attached” mode. If you start in “detached” mode only the 
first execute() will happen and the rest will be ignored.

This brings us to the tricky question about what to do about “detached” and 
“attached”. In the long run, I would like to get rid of the distinction and 
leave it up to the user program, by either blocking or not on the Future (or 
JobClient or whatnot) that job submission returns. This, however, means that 
users cannot simply request “detached” execution when using bin/flink, the user 
program has to “play along”. On the other hand, “detached” mode is quite 
strange for the user program. The execute() call either returns with a proper 
job result after the job ran (in “attached” mode) or with a dummy result (in 
“detached” mode) right after submission. I think this can even lead to weird 
cases where multiple "execute()” run in parallel. For per-job detached mode we 
also “throw” out of the first execute so the rest (including result processing 
logic) is ignored.

For this here FLIP-73 we can (and should) ignore these problems, because 
FLIP-73 only moves the existing submission logic behind a reusable abstraction 
and makes it usable via API. We should closely follow up on the above points 
though because I think they are also important.

Best,
Aljoscha

> On 2. Oct 2019, at 12:08, Zili Chen <wander4...@gmail.com> wrote:
> 
> Thanks for your clarification Till.
> 
> I agree with the current semantics of the per-job mode, one should deploy a
> new cluster for each part of the job. Apart from the performance concern
> it also means that PerJobExecutor knows how to deploy a cluster actually,
> which is different from the description that Executor submit a job.
> 
> Anyway it sounds workable and narrow the changes.

Reply via email to