[ https://issues.apache.org/jira/browse/FLINK-15456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zhu Zhu updated FLINK-15456: ---------------------------- Attachment: jm_part2.log > Job keeps failing on slot allocation timeout due to RM not allocating new TMs > for slot requests > ----------------------------------------------------------------------------------------------- > > Key: FLINK-15456 > URL: https://issues.apache.org/jira/browse/FLINK-15456 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination > Affects Versions: 1.10.0 > Reporter: Zhu Zhu > Priority: Blocker > Fix For: 1.10.0 > > Attachments: jm_part.log, jm_part2.log > > > As in the attached JM log, the job tried to start 30 TMs but only 29 are > registered. So the job fails due to not able to acquire all 30 slots needed > in time. > And when the failover happens and tasks are re-scheduled, the RM will not ask > for new TMs even if it cannot fulfill the slot requests. So the job will keep > failing for slot allocation timeout. -- This message was sent by Atlassian Jira (v8.3.4#803005)