[ 
https://issues.apache.org/jira/browse/FLINK-20138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17231130#comment-17231130
 ] 

wgcn commented on FLINK-20138:
------------------------------

hi~ [~trohrmann]  please have a look at this issue

> Flink Job can not recover due to  timeout of requiring slots when flink 
> jobmanager restarted
> --------------------------------------------------------------------------------------------
>
>                 Key: FLINK-20138
>                 URL: https://issues.apache.org/jira/browse/FLINK-20138
>             Project: Flink
>          Issue Type: Bug
>          Components: Deployment / YARN, Table SQL / Runtime
>         Environment: flink : 1.9.2
> hadoop :2.7.2
> jdk:1.8
>            Reporter: wgcn
>            Priority: Major
>         Attachments: 2820F7EE-85F9-441D-95D5-8163FB6267DF.png, jobmanager.log
>
>
> our flink jobs run on Yarn Perjob Mode. We stoped some nodemanger  machines  
> ,and   AMs of  the  machines  restarted at other nodemanager.  We found  some 
> jobs  can not recover due to  timeout of requiring slots.
> *SlotPoolImp always did not connect ResourceManager *
> ```
> 2020-11-09 16:31:31,794                           INFO 
> flink-akka.actor.default-dispatcher-16 
> (org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl.stashRequestWaitingForResourceManager:369)
>  - Cannot serve slot request, no ResourceManager connected. Adding as pending 
> request [SlotRequestId{456c9daa6670a4490810f8e51f495174}]
> ```
> *1.We did not find  the log of YarnResourceManager requesting container   at 
> the jobmanager log of attachment. 
> 2.The node  of Zookeeper is also  showed at attachment .*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to