[ https://issues.apache.org/jira/browse/FLINK-20138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17231130#comment-17231130 ]
wgcn commented on FLINK-20138: ------------------------------ hi~ [~trohrmann] please have a look at this issue > Flink Job can not recover due to timeout of requiring slots when flink > jobmanager restarted > -------------------------------------------------------------------------------------------- > > Key: FLINK-20138 > URL: https://issues.apache.org/jira/browse/FLINK-20138 > Project: Flink > Issue Type: Bug > Components: Deployment / YARN, Table SQL / Runtime > Environment: flink : 1.9.2 > hadoop :2.7.2 > jdk:1.8 > Reporter: wgcn > Priority: Major > Attachments: 2820F7EE-85F9-441D-95D5-8163FB6267DF.png, jobmanager.log > > > our flink jobs run on Yarn Perjob Mode. We stoped some nodemanger machines > ,and AMs of the machines restarted at other nodemanager. We found some > jobs can not recover due to timeout of requiring slots. > *SlotPoolImp always did not connect ResourceManager * > ``` > 2020-11-09 16:31:31,794 INFO > flink-akka.actor.default-dispatcher-16 > (org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl.stashRequestWaitingForResourceManager:369) > - Cannot serve slot request, no ResourceManager connected. Adding as pending > request [SlotRequestId{456c9daa6670a4490810f8e51f495174}] > ``` > *1.We did not find the log of YarnResourceManager requesting container at > the jobmanager log of attachment. > 2.The node of Zookeeper is also showed at attachment .* -- This message was sent by Atlassian Jira (v8.3.4#803005)