[GitHub] [flink] XComp commented on pull request #21137: [FLINK-29234][runtime] JobMasterServiceLeadershipRunner handle leader event in a separate executor to avoid dead lock

GitBox Wed, 02 Nov 2022 09:17:00 -0700


XComp commented on PR #21137:
URL: https://github.com/apache/flink/pull/21137#issuecomment-1300806942

> To me the issue stems more from both the runner and election service
calling into each other under locks (== fundamental issue that should never
happen), and locks maybe being way too broad.
For example, why is Runner#closeAsync doing the entire shutdown under the
lock? Modifying the state should suffice, because all other operations are
checking that it's running.

As far as we concluded, the problem appears if the `CompletableFuture`
that's returned by `JobMasterServiceProcess#closeAsync` (see
[JobMasterServiceLeadershipRunner.java:145](https://github.com/apache/flink/blob/113299701cc0c41bf7fc4bbe86cebd3beea8dbe3/flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/JobMasterServiceLeadershipRunner.java#L145))
completes before triggering the callback that follows (that releases the
`ClassLoaderLease` and stops the `DefaultLeaderElectionService`). In that case,
the entire callback will be executed in the synchronized block rightaway, which
is not what we want. We could work around that issue by calling
`runAfterwardsAsync`, instead, which would make sure that the followup calls
are not executed within the synchronized block. It feels like a similar pattern
(i.e. introducing an async execution in a separate thread) to what we have in
the PR right now (just in a different location). Does that sound more
reasonable?

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] XComp commented on pull request #21137: [FLINK-29234][runtime] JobMasterServiceLeadershipRunner handle leader event in a separate executor to avoid dead lock

Reply via email to