Nico Kruber created FLINK-10443: ----------------------------------- Summary: Flink Yarn CLI hanging after unsuccessful deployment Key: FLINK-10443 URL: https://issues.apache.org/jira/browse/FLINK-10443 Project: Flink Issue Type: Bug Components: Distributed Coordination, YARN Affects Versions: 1.5.4 Reporter: Nico Kruber
It looks like the Flink CLI is retrying forever to cancel a YARN deployment which has failed already (during deployment). Additionally, you cannot cancel the CLI via CTRL+C Reproduce via a failing deployment due to an inaccessible SSL key file: {code} // flink-conf.yaml security.ssl.enabled: true web.ssl.enabled: false security.ssl.keystore: /path/to/non-existing/internal.keystore security.ssl.keystore-password: internal_store_password security.ssl.key-password: internal_key_password security.ssl.truststore: /path/to/non-existing/internal.keystore security.ssl.truststore-password: internal_store_password {code} {code} ./bin/flink run -m yarn-cluster -p 2 ./examples/streaming/WordCount.jar --input /usr/share/doc/java-1.8.0-openjdk-devel-1.8.0.181/LICENSE {code} After the failure, the CLI will hang with repeatedly printing this: {code} 2018-09-26 15:17:50,348 INFO org.apache.hadoop.io.retry.RetryInvocationHandler - Exception while invoking ApplicationClientProtocolPBClientImpl.forceKillApplication over null. Retrying after sleeping for 30000ms. java.io.IOException: The client is stopped at org.apache.hadoop.ipc.Client.getConnection(Client.java:1519) at org.apache.hadoop.ipc.Client.call(Client.java:1381) at org.apache.hadoop.ipc.Client.call(Client.java:1345) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) at com.sun.proxy.$Proxy8.forceKillApplication(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.forceKillApplication(ApplicationClientProtocolPBClientImpl.java:213) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:409) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:163) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:155) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:346) at com.sun.proxy.$Proxy9.forceKillApplication(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.killApplication(YarnClientImpl.java:439) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.killApplication(YarnClientImpl.java:419) at org.apache.flink.yarn.AbstractYarnClusterDescriptor.failSessionDuringDeployment(AbstractYarnClusterDescriptor.java:1236) at org.apache.flink.yarn.AbstractYarnClusterDescriptor.access$200(AbstractYarnClusterDescriptor.java:111) at org.apache.flink.yarn.AbstractYarnClusterDescriptor$DeploymentFailureHook.run(AbstractYarnClusterDescriptor.java:1493) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)