Stephan Ewen created FLINK-24343: ------------------------------------ Summary: Revisit Scheduler and Coordinator Startup Procedure Key: FLINK-24343 URL: https://issues.apache.org/jira/browse/FLINK-24343 Project: Flink Issue Type: Bug Components: Runtime / Coordination Affects Versions: 1.13.2, 1.14.0 Reporter: Stephan Ewen Fix For: 1.15.0
We need to re-examine the startup procedure of the scheduler, and how it interacts with the startup of the operator coordinators. We need to make sure the following conditions are met: - The Operator Coordinators are started before the first action happens that they need to be informed of. That includes as task being ready, a checkpoint happening, etc. - The scheduler must be started to the point that it can handle "failGlobal()" calls, because the coordinators might trigger that during their startup when an exception in "start()" occurs. /cc [~chesnay] -- This message was sent by Atlassian Jira (v8.3.4#803005)