[jira] [Comment Edited] (FLINK-13895) Client does not exit when bin/yarn-session.sh come fail

Andrey Zagrebin (Jira) Thu, 05 Sep 2019 01:58:49 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-13895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16923213#comment-16923213
 ]


Andrey Zagrebin edited comment on FLINK-13895 at 9/5/19 8:57 AM:
-----------------------------------------------------------------

>From the logs, looks like the application killing hangs because the client 
>cannot connect to the yarn cluster RM, some networking, non-Flink issue per se.

The ConfiguredRMFailoverProxyProvider could be probably reconfigured to do 
limited number of reconnection retries and prevent Flink cli from hanging.

>From the source code of ConfiguredRMFailoverProxyProvider.init, it looks like 
>yarn.client.failover-retries is the option to tweak (if the default zero value 
>probably means infinite retries). Not sure whether it makes to tweak this 
>option in Flink for Yarn deployments by default.

[~yuwang0...@gmail.com] could you try to set [yarn.client.failover-retries or 
yarn.client.failover-retries-on-socket-timeouts|https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-common/yarn-default.xml]
 to some small value to see if reconnection attempts stop and cli exits?


was (Author: azagrebin):
>From the logs, looks like the application killing hangs because the client 
>cannot connect to the yarn cluster RM, some networking, non-Flink issue per se.

The ConfiguredRMFailoverProxyProvider could be probably reconfigured to do 
limited number of reconnection retries and prevent Flink cli from hanging.

>From the source code of ConfiguredRMFailoverProxyProvider.init, it looks like 
>yarn.client.failover-retries is the option to tweak (if the default zero value 
>probably means infinite retries). Not sure whether it makes to tweak this 
>option in Flink for Yarn deployments by default.

[~yuwang0...@gmail.com] could you try to set [yarn.client.failover-retries or 
yarn.client.failover-retries-on-socket-timeouts|[https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-common/yarn-default.xml]]
 to some small value to see if reconnection attempts stop and cli exits?

> Client does not exit when bin/yarn-session.sh come fail
> -------------------------------------------------------
>
>                 Key: FLINK-13895
>                 URL: https://issues.apache.org/jira/browse/FLINK-13895
>             Project: Flink
>          Issue Type: Improvement
>          Components: Deployment / YARN
>    Affects Versions: 1.9.0
>            Reporter: Yu Wang
>            Priority: Minor
>              Labels: pull-request-available
>         Attachments: client_exit.txt
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> the hadoop cluster environment java version is 1.7, flink is compiled with 
> jdk1.8，I used bin/yarn-session.sh submit it , then client comes error and 
> does not exit . I found yarn application which is failed , so then we should 
> not kill the yarn application, we can stop the yarn client . attachments is 
> operation log



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Comment Edited] (FLINK-13895) Client does not exit when bin/yarn-session.sh come fail

Reply via email to