[ https://issues.apache.org/jira/browse/FLINK-13895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16923213#comment-16923213 ]
Andrey Zagrebin edited comment on FLINK-13895 at 9/5/19 8:57 AM: ----------------------------------------------------------------- >From the logs, looks like the application killing hangs because the client >cannot connect to the yarn cluster RM, some networking, non-Flink issue per se. The ConfiguredRMFailoverProxyProvider could be probably reconfigured to do limited number of reconnection retries and prevent Flink cli from hanging. >From the source code of ConfiguredRMFailoverProxyProvider.init, it looks like >yarn.client.failover-retries is the option to tweak (if the default zero value >probably means infinite retries). Not sure whether it makes to tweak this >option in Flink for Yarn deployments by default. [~yuwang0...@gmail.com] could you try to set [yarn.client.failover-retries or yarn.client.failover-retries-on-socket-timeouts|https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-common/yarn-default.xml] to some small value to see if reconnection attempts stop and cli exits? was (Author: azagrebin): >From the logs, looks like the application killing hangs because the client >cannot connect to the yarn cluster RM, some networking, non-Flink issue per se. The ConfiguredRMFailoverProxyProvider could be probably reconfigured to do limited number of reconnection retries and prevent Flink cli from hanging. >From the source code of ConfiguredRMFailoverProxyProvider.init, it looks like >yarn.client.failover-retries is the option to tweak (if the default zero value >probably means infinite retries). Not sure whether it makes to tweak this >option in Flink for Yarn deployments by default. [~yuwang0...@gmail.com] could you try to set [yarn.client.failover-retries or yarn.client.failover-retries-on-socket-timeouts|[https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-common/yarn-default.xml]] to some small value to see if reconnection attempts stop and cli exits? > Client does not exit when bin/yarn-session.sh come fail > ------------------------------------------------------- > > Key: FLINK-13895 > URL: https://issues.apache.org/jira/browse/FLINK-13895 > Project: Flink > Issue Type: Improvement > Components: Deployment / YARN > Affects Versions: 1.9.0 > Reporter: Yu Wang > Priority: Minor > Labels: pull-request-available > Attachments: client_exit.txt > > Time Spent: 10m > Remaining Estimate: 0h > > the hadoop cluster environment java version is 1.7, flink is compiled with > jdk1.8,I used bin/yarn-session.sh submit it , then client comes error and > does not exit . I found yarn application which is failed , so then we should > not kill the yarn application, we can stop the yarn client . attachments is > operation log -- This message was sent by Atlassian Jira (v8.3.2#803003)