Truong Duc Kien created FLINK-9583: -------------------------------------- Summary: Wrong number of TaskManagers' slots after recovery. Key: FLINK-9583 URL: https://issues.apache.org/jira/browse/FLINK-9583 Project: Flink Issue Type: Bug Components: ResourceManager Affects Versions: 1.5.0 Environment: Flink 1.5.0 on YARN with the default execution mode. Reporter: Truong Duc Kien Attachments: jm.log
We started a job with 120 slots, using a FixedDelayRestart strategy with the delay of 1 minutes. During recovery, some but not all Slots were released. When the job restarts again, Flink requests a new batch of slots. The total number of slots is now 193, larger than the configured amount, but the excess slots are never released. This bug does not happen with legacy mode. I've attach the job manager log. -- This message was sent by Atlassian JIRA (v7.6.3#76005)