[ https://issues.apache.org/jira/browse/FLINK-14434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16954316#comment-16954316 ]
Zili Chen commented on FLINK-14434: ----------------------------------- Now I would prefer 2 only because with 1 we possibly miss exception in JobManagerRunner#start so that user receive submission success but request job result with job not found because start failed and the job manager runner future removed without a result. > Dispatcher#createJobManagerRunner should returns on creation succeed, not > after startJobManagerRunner > ----------------------------------------------------------------------------------------------------- > > Key: FLINK-14434 > URL: https://issues.apache.org/jira/browse/FLINK-14434 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination > Affects Versions: 1.10.0 > Reporter: Zili Chen > Assignee: Zili Chen > Priority: Major > Fix For: 1.10.0 > > Attachments: patch.diff > > > In an edge case, let's said > 1) job finished nearly immediately > 2) Dispatcher has been suspended in {{#startJobManagerRunner}} after > {{jobManagerRunner.start();}} but before {{return jobManagerRunner;}} > due to > 1) we put {{jobManagerRunnerFutures}} with {{#startJobManagerRunner}} > finished. > 2) the creation of JobManagerRunner doesn't happen in MainThread. > it is a possible execution order > 1) JobManagerRunner created in akka-dispatcher thread > 2) then apply {{Dispatcher#startJobManagerRunner}} > 3) until {{jobManagerRunner.start();}} and before {{return jobManagerRunner;}} > 4) this thread suspended > 5) job finished, execute callback on MainThread > 6) {{jobManagerRunnerFutures.get(jobID).getNow(null)}} returns {{null}} > because akka-dispatcher thread doesn't {{return jobManagerRunner;}} > 7) it report {{There is a newer JobManagerRunner for the job}} but actually > not. > **Solution** > Two perspective but we can even have them both. > 1. return {{jobManagerRunnerFuture}} in {{#createJobManagerRunner}}, let > {{#startJobManagerRunner}} an action > 2. on JobManagerRunner created, execute {{#startJobManagerRunner}} in > MainThread. > CC [~trohrmann] -- This message was sent by Atlassian Jira (v8.3.4#803005)