Till Rohrmann created FLINK-10505: ------------------------------------- Summary: Treat fail signal as scheduling event Key: FLINK-10505 URL: https://issues.apache.org/jira/browse/FLINK-10505 Project: Flink Issue Type: Sub-task Components: Distributed Coordination Affects Versions: 1.7.0 Reporter: Till Rohrmann Fix For: 1.7.0
Instead of simply calling into the {{RestartStrategy}} which restarts the existing {{ExecutionGraph}} with the same parallelism, the {{ExecutionGraphDriver}} should treat a recovery similar to the initial scheduling operation. First, one needs to decide on the new parallelism of the {{ExecutionGraph}} (scale up/scale down) wrt to the available set of resources. Only if the minimum configuration is fulfilled, the potentially rescaled {{ExecutionGraph}} will be restarted. -- This message was sent by Atlassian JIRA (v7.6.3#76005)