[ https://issues.apache.org/jira/browse/FLINK-10435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
sunjincheng updated FLINK-10435: -------------------------------- Fix Version/s: (was: 1.6.4) 1.6.5 > Client sporadically hangs after Ctrl + C > ---------------------------------------- > > Key: FLINK-10435 > URL: https://issues.apache.org/jira/browse/FLINK-10435 > Project: Flink > Issue Type: Bug > Components: Client, YARN > Affects Versions: 1.5.5, 1.6.2, 1.7.0 > Reporter: Gary Yao > Priority: Major > Fix For: 1.7.3, 1.8.0, 1.6.5 > > > When submitting a YARN job cluster in attached mode, the client hangs > indefinitely if Ctrl + C is pressed at the right time. One can recover from > this by sending SIGKILL. > *Command to submit job* > {code} > HADOOP_CLASSPATH=`hadoop classpath` bin/flink run -m yarn-cluster > examples/streaming/WordCount.jar > {code} > > *Output/Stacktrace* > {code} > [hadoop@ip-172-31-45-22 flink-1.5.4]$ HADOOP_CLASSPATH=`hadoop classpath` > bin/flink run -m yarn-cluster examples/streaming/WordCount.jar > Setting HADOOP_CONF_DIR=/etc/hadoop/conf because no HADOOP_CONF_DIR was set. > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/home/hadoop/flink-1.5.4/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] > 2018-09-26 12:01:04,241 INFO org.apache.hadoop.yarn.client.RMProxy > - Connecting to ResourceManager at > ip-172-31-45-22.eu-central-1.compute.internal/172.31.45.22:8032 > 2018-09-26 12:01:04,386 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli > - No path for the flink jar passed. Using the location of class > org.apache.flink.yarn.YarnClusterDescriptor to locate the jar > 2018-09-26 12:01:04,386 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli > - No path for the flink jar passed. Using the location of class > org.apache.flink.yarn.YarnClusterDescriptor to locate the jar > 2018-09-26 12:01:04,402 WARN > org.apache.flink.yarn.AbstractYarnClusterDescriptor - Neither the > HADOOP_CONF_DIR nor the YARN_CONF_DIR environment variable is set. The Flink > YARN Client needs one of these to be set to properly load the Hadoop > configuration for accessing YARN. > 2018-09-26 12:01:04,598 INFO > org.apache.flink.yarn.AbstractYarnClusterDescriptor - Cluster > specification: ClusterSpecification{masterMemoryMB=1024, > taskManagerMemoryMB=1024, numberTaskManagers=1, slotsPerTaskManager=1} > 2018-09-26 12:01:04,972 WARN > org.apache.flink.yarn.AbstractYarnClusterDescriptor - The > configuration directory ('/home/hadoop/flink-1.5.4/conf') contains both LOG4J > and Logback configuration files. Please delete or rename one of them. > 2018-09-26 12:01:07,857 INFO > org.apache.flink.yarn.AbstractYarnClusterDescriptor - Submitting > application master application_1537944258063_0017 > 2018-09-26 12:01:07,913 INFO > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted > application application_1537944258063_0017 > 2018-09-26 12:01:07,913 INFO > org.apache.flink.yarn.AbstractYarnClusterDescriptor - Waiting for > the cluster to be allocated > 2018-09-26 12:01:07,916 INFO > org.apache.flink.yarn.AbstractYarnClusterDescriptor - Deploying > cluster, current state ACCEPTED > ^C2018-09-26 12:01:08,851 INFO > org.apache.flink.yarn.AbstractYarnClusterDescriptor - Cancelling > deployment from Deployment Failure Hook > 2018-09-26 12:01:08,854 INFO > org.apache.flink.yarn.AbstractYarnClusterDescriptor - Killing YARN > application > ------------------------------------------------------------ > The program finished with the following exception: > org.apache.flink.client.deployment.ClusterDeploymentException: Couldn't > deploy Yarn session cluster > at > org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploySessionCluster(AbstractYarnClusterDescriptor.java:410) > at > org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:258) > at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:214) > at > org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1025) > at > org.apache.flink.client.cli.CliFrontend.lambda$main$9(CliFrontend.java:1101) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836) > at > org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) > at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1101) > Caused by: > org.apache.flink.yarn.AbstractYarnClusterDescriptor$YarnDeploymentException: > The YARN application unexpectedly switched to state KILLED during deployment. > Diagnostics from YARN: Application application_1537944258063_0017 was killed > by user hadoop at 172.31.45.22 > If log aggregation is enabled on your cluster, use this command to further > investigate the issue: > yarn logs -applicationId application_1537944258063_0017 > at > org.apache.flink.yarn.AbstractYarnClusterDescriptor.startAppMaster(AbstractYarnClusterDescriptor.java:1059) > at > org.apache.flink.yarn.AbstractYarnClusterDescriptor.deployInternal(AbstractYarnClusterDescriptor.java:532) > at > org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploySessionCluster(AbstractYarnClusterDescriptor.java:403) > ... 9 more > 2018-09-26 12:01:09,065 INFO > org.apache.hadoop.io.retry.RetryInvocationHandler - Exception > while invoking ApplicationClientProtocolPBClientImpl.forceKillApplication > over null. Retrying after sleeping for 30000ms. > java.io.IOException: The client is stopped > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1519) > at org.apache.hadoop.ipc.Client.call(Client.java:1381) > at org.apache.hadoop.ipc.Client.call(Client.java:1345) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) > at com.sun.proxy.$Proxy8.forceKillApplication(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.forceKillApplication(ApplicationClientProtocolPBClientImpl.java:213) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:409) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:163) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:155) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:346) > at com.sun.proxy.$Proxy9.forceKillApplication(Unknown Source) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.killApplication(YarnClientImpl.java:439) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.killApplication(YarnClientImpl.java:419) > at > org.apache.flink.yarn.AbstractYarnClusterDescriptor.failSessionDuringDeployment(AbstractYarnClusterDescriptor.java:1236) > at > org.apache.flink.yarn.AbstractYarnClusterDescriptor.access$200(AbstractYarnClusterDescriptor.java:111) > at > org.apache.flink.yarn.AbstractYarnClusterDescriptor$DeploymentFailureHook.run(AbstractYarnClusterDescriptor.java:1493) > {code} > *Expected behavior* > Client should shutdown the YARN cluster and exit. -- This message was sent by Atlassian JIRA (v7.6.3#76005)