Zili Chen created FLINK-14434: --------------------------------- Summary: Dispatcher#createJobManagerRunner should returns on creation succeed, not after startJobManagerRunner Key: FLINK-14434 URL: https://issues.apache.org/jira/browse/FLINK-14434 Project: Flink Issue Type: Bug Components: Runtime / Coordination Affects Versions: 1.10.0 Reporter: Zili Chen Assignee: Zili Chen Fix For: 1.10.0
In an edge case, let's said 1) job finished nearly immediately 2) Dispatcher has been suspended in {{#startJobManagerRunner}} after {{jobManagerRunner.start();}} but before {{return jobManagerRunner;}} due to 1) we put {{jobManagerRunnerFutures}} with {{#startJobManagerRunner}} finished. 2) the creation of JobManagerRunner doesn't happen in MainThread. it is a possible execution order 1) JobManagerRunner created in akka-dispatcher thread 2) then apply {{Dispatcher#startJobManagerRunner}} 3) until {{jobManagerRunner.start();}} and before {{return jobManagerRunner;}} 4) this thread suspended 5) job finished, execute callback on MainThread 6) {{jobManagerRunnerFutures.get(jobID).getNow(null)}} returns {{null}} because akka-dispatcher thread doesn't {{return jobManagerRunner;}} 7) it report {{There is a newer JobManagerRunner for the job}} but actually not. **Solution** Two perspective but we can even have them both. 1. return {{jobManagerRunnerFuture}} in {{#createJobManagerRunner}}, let {{#startJobManagerRunner}} an action 2. on JobManagerRunner created, execute {{#startJobManagerRunner}} in MainThread. CC [~trohrmann] -- This message was sent by Atlassian Jira (v8.3.4#803005)