Thanks Rong, I will follow that issue.
> On Mar 30, 2019, at 6:42 AM, Rong Rong <walter...@gmail.com
> <mailto:walter...@gmail.com>> wrote:
>
> Hi Qi,
>
> I think the problem may be related to another similar problem reported in a
> previous JIRA [1]. I think a PR is also in discussion.
>
> Thanks,
> Rong
>
> [1] https://issues.apache.org/jira/browse/FLINK-10868
> <https://issues.apache.org/jira/browse/FLINK-10868>
> On Fri, Mar 29, 2019 at 5:09 AM qi luo <luoqi...@gmail.com
> <mailto:luoqi...@gmail.com>> wrote:
> Hello,
>
> Today we encountered an issue where our Flink job request for Yarn container
> infinitely. In the JM log as below, there were errors when starting TMs
> (caused by underlying HDFS errors). So the allocated container failed and the
> job kept requesting for new containers. The failed containers were also not
> returned the the Yarn, so this job quickly exhausted our Yarn resources.
>
> Is there any way we can avoid such behavior? Thank you!
>
> ————————
> JM log:
>
> INFO org.apache.flink.yarn.YarnResourceManager -
> Creating container launch context for TaskManagers
> INFO org.apache.flink.yarn.YarnResourceManager -
> Starting TaskManagers
> INFO org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy
> - Opening proxy : xxx.yyy
> ERROR org.apache.flink.yarn.YarnResourceManager - Could
> not start TaskManager in container container_e12345.
> org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to
> start container.
> ....
> INFO org.apache.flink.yarn.YarnResourceManager -
> Requesting new TaskExecutor container with resources <memory:16384,
> vCores:4>. Number pending requests 19.
> INFO org.apache.flink.yarn.YarnResourceManager -
> Received new container: container_e195_1553781735010_27100_01_000136 -
> Remaining pending container requests: 19
> ————————
>
> Thanks,
> Qi