Thanks Rong, I will follow that issue.

> On Mar 30, 2019, at 6:42 AM, Rong Rong <walter...@gmail.com 
> <mailto:walter...@gmail.com>> wrote:
> 
> Hi Qi,
> 
> I think the problem may be related to another similar problem reported in a 
> previous JIRA [1]. I think a PR is also in discussion.
> 
> Thanks,
> Rong
> 
> [1] https://issues.apache.org/jira/browse/FLINK-10868 
> <https://issues.apache.org/jira/browse/FLINK-10868>
> On Fri, Mar 29, 2019 at 5:09 AM qi luo <luoqi...@gmail.com 
> <mailto:luoqi...@gmail.com>> wrote:
> Hello,
> 
> Today we encountered an issue where our Flink job request for Yarn container 
> infinitely. In the JM log as below, there were errors when starting TMs 
> (caused by underlying HDFS errors). So the allocated container failed and the 
> job kept requesting for new containers. The failed containers were also not 
> returned the the Yarn, so this job quickly exhausted our Yarn resources. 
> 
> Is there any way we can avoid such behavior? Thank you!
> 
> ————————
> JM log:
> 
> INFO  org.apache.flink.yarn.YarnResourceManager                     - 
> Creating container launch context for TaskManagers
> INFO  org.apache.flink.yarn.YarnResourceManager                     - 
> Starting TaskManagers
> INFO  org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy 
>  - Opening proxy : xxx.yyy
> ERROR org.apache.flink.yarn.YarnResourceManager                     - Could 
> not start TaskManager in container container_e12345.
> org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to 
> start container.
> ....
> INFO  org.apache.flink.yarn.YarnResourceManager                     - 
> Requesting new TaskExecutor container with resources <memory:16384, 
> vCores:4>. Number pending requests 19.
> INFO  org.apache.flink.yarn.YarnResourceManager                     - 
> Received new container: container_e195_1553781735010_27100_01_000136 - 
> Remaining pending container requests: 19
> ————————
> 
> Thanks,
> Qi

Reply via email to