[ https://issues.apache.org/jira/browse/FLINK-14434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16955058#comment-16955058 ]
Zili Chen commented on FLINK-14434: ----------------------------------- Thanks for your reply [~trohrmann]. I final a possibly better option3 for this issue. The diff is tiny and expressive. https://github.com/TisonKun/flink/commit/baacecb92f11cd367dc89bc48744fea6de94670b In short, the problem described above is mainly caused by when "job manager runner result future" called back, "job manager runner created future" conceptually finished but not completed. Revisit the semantic of {{#createJobManagerRunner}} we are able to just return a future represent the creation and let the caller take care of the start. Compared with option 2, this approach has a clear semantic and a subtle difference that it execute {{JobManagerRunner}} in akka-dispatcher thread, not in MainThread. We internally have some issue if starting jm runner happens in dispatcher MainThread[1] but it doesn't exist in community codebase. I'm glad to help with FLINK-11843 and FLINK-11719 on the review side. For this issue I'd like to send this tiny patch as a pull request so that you can coordinate patches depending on your schedule. [1] FYI: It is an interesting case but bias a bit from community codebase. We move the job registry totally to Dispatcher so that when job manager runner granted leadership it send a RPC to Dispatcher for querying what job scheduling status now. Our fork is currently based on 1.7 so that there is a dead-lock execution order with solution option 2 above. 1. job manager runner called {{#start}} in Dispatcher MainThread 2. job manager runner leader election service started, and if it is standalone(non-ha), it directly calls grantLeadership 3. job manager runner on granted leadership, send a RPC to Dispatcher for querying and wait for the result. 4. because {{#start}} occupied the MainThread, the later RPC cannot be processed. We can workaround this case in many ways such as dispatch action a bit, but it might infer that if we can schedule an action out of Dispatcher MainThread without worry about synchronization provided by single-thread, we'd better to do it. > Dispatcher#createJobManagerRunner should returns on creation succeed, not > after startJobManagerRunner > ----------------------------------------------------------------------------------------------------- > > Key: FLINK-14434 > URL: https://issues.apache.org/jira/browse/FLINK-14434 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination > Affects Versions: 1.10.0 > Reporter: Zili Chen > Assignee: Zili Chen > Priority: Major > Fix For: 1.10.0 > > Attachments: patch.diff > > > In an edge case, let's said > 1) job finished nearly immediately > 2) Dispatcher has been suspended in {{#startJobManagerRunner}} after > {{jobManagerRunner.start();}} but before {{return jobManagerRunner;}} > due to > 1) we put {{jobManagerRunnerFutures}} with {{#startJobManagerRunner}} > finished. > 2) the creation of JobManagerRunner doesn't happen in MainThread. > it is a possible execution order > 1) JobManagerRunner created in akka-dispatcher thread > 2) then apply {{Dispatcher#startJobManagerRunner}} > 3) until {{jobManagerRunner.start();}} and before {{return jobManagerRunner;}} > 4) this thread suspended > 5) job finished, execute callback on MainThread > 6) {{jobManagerRunnerFutures.get(jobID).getNow(null)}} returns {{null}} > because akka-dispatcher thread doesn't {{return jobManagerRunner;}} > 7) it report {{There is a newer JobManagerRunner for the job}} but actually > not. > **Solution** > Two perspective but we can even have them both. > 1. return {{jobManagerRunnerFuture}} in {{#createJobManagerRunner}}, let > {{#startJobManagerRunner}} an action > 2. on JobManagerRunner created, execute {{#startJobManagerRunner}} in > MainThread. > CC [~trohrmann] -- This message was sent by Atlassian Jira (v8.3.4#803005)