Zhu Zhu created FLINK-14331:
-------------------------------

             Summary: Reset vertices right after they transition to terminated 
states
                 Key: FLINK-14331
                 URL: https://issues.apache.org/jira/browse/FLINK-14331
             Project: Flink
          Issue Type: Sub-task
          Components: Runtime / Coordination
    Affects Versions: 1.10.0
            Reporter: Zhu Zhu
             Fix For: 1.10.0


Currently in DefaultScheduler, tasks to restart will remain in terminated state 
until they are re-scheduled by the SchedulingStrategy.
This behavior may cause 2 problems:
1. Failed/Canceled tasks are possibly not be able to be restarted in lazy 
scheduling. e.g. The job A1--pipelined-->B1 fails. And only A1 will be 
re-scheduled on restartTasks() since the inputs of B1 are not ready. B1 should 
be scheduled later on the partition consumable event from restarted A1. But the 
terminal state of B1 will prevent B1 from being scheduled.
2. Keeping a task in FAILED/CANCELED state for a long time can happen if it 
takes a long time for its inputs to become ready again. This is also not 
friendly to users, which may cause confusions.

That's why I'd propose to reset vertices right after they transition to 
terminated states.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to