Stephan Ewen created FLINK-24343:
------------------------------------

             Summary: Revisit Scheduler and Coordinator Startup Procedure
                 Key: FLINK-24343
                 URL: https://issues.apache.org/jira/browse/FLINK-24343
             Project: Flink
          Issue Type: Bug
          Components: Runtime / Coordination
    Affects Versions: 1.13.2, 1.14.0
            Reporter: Stephan Ewen
             Fix For: 1.15.0


We need to re-examine the startup procedure of the scheduler, and how it 
interacts with the startup of the operator coordinators.

We need to make sure the following conditions are met:
  - The Operator Coordinators are started before the first action happens that 
they need to be informed of. That includes as task being ready, a checkpoint 
happening, etc.

  - The scheduler must be started to the point that it can handle 
"failGlobal()" calls, because the coordinators might trigger that during their 
startup when an exception in "start()" occurs.

/cc [~chesnay]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to