What's in the container log for the container that failed? On Sep 11, 2017 2:17 AM, "Sridhar Chellappa" <flinken...@gmail.com> wrote:
I am trying to start Flink(Version 1.3.0) on YARN (Hadoop 2.8.1) by issuing the following command: ~/flink-1.3.0/bin/yarn-session.sh -s 4 -n 10 -jm 4096 -tm 4096-d I am seeing a flurry of these Errors: 2017-09-11 08:17:11,410 INFO org.apache.flink.yarn. YarnClusterDescriptor - Deployment took more than 60 seconds. Please check if the requested resources are available in the YARN cluster 2017-09-11 08:17:11,661 INFO org.apache.flink.yarn. YarnClusterDescriptor - Deployment took more than 60 seconds. Please check if the requested resources are available in the YARN cluster 2017-09-11 08:17:11,912 INFO org.apache.flink.yarn. YarnClusterDescriptor - Deployment took more than 60 seconds. Please check if the requested resources are available in the YARN cluster 2017-09-11 08:17:12,163 INFO org.apache.flink.yarn. YarnClusterDescriptor - Deployment took more than 60 seconds. Please check if the requested resources are available in the YARN cluster And then, my deployment fails with the following exception : Error while deploying YARN cluster: Couldn't deploy Yarn cluster java.lang.RuntimeException: Couldn't deploy Yarn cluster at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploy( AbstractYarnClusterDescriptor.java:439) at org.apache.flink.yarn.cli.FlinkYarnSessionCli.run( FlinkYarnSessionCli.java:630) at org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call( FlinkYarnSessionCli.java:486) at org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call( FlinkYarnSessionCli.java:483) at org.apache.flink.runtime.security.HadoopSecurityContext$1.run( HadoopSecurityContext.java:43) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs( UserGroupInformation.java:1548) at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured( HadoopSecurityContext.java:40) at org.apache.flink.yarn.cli.FlinkYarnSessionCli.main( FlinkYarnSessionCli.java:483) Caused by: org.apache.flink.yarn.AbstractYarnClusterDescriptor$YarnDeploymentException: The YARN application unexpectedly switched to state FAILED during deployment. Diagnostics from YARN: Application application_1504851547322_0003 failed 2 times due to AM Container for appattempt_1504851547322_0003_000002 exited with exitCode: 31 Failing this attempt.Diagnostics: Exception from container-launch. Container id: container_1504851547322_0003_02_000001 Exit code: 31 Stack trace: ExitCodeException exitCode=31: at org.apache.hadoop.util.Shell.runCommand(Shell.java:972) at org.apache.hadoop.util.Shell.run(Shell.java:869) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute( Shell.java:1170) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor. launchContainer(DefaultContainerExecutor.java:236) at org.apache.hadoop.yarn.server.nodemanager.containermanager. launcher.ContainerLaunch.call(ContainerLaunch.java:305) at org.apache.hadoop.yarn.server.nodemanager.containermanager. launcher.ContainerLaunch.call(ContainerLaunch.java:84) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker( ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run( ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748) Further Debugging at the JobManager logs shows : Resetting connection and trying again with a new connection. 2017-09-11 08:17:11,820 INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=high-availability.zookeeper.quorum: 10.200.0.6:2181,10.200.0.7:2181,10.200.0.9:2181 sessionTimeout=60000 watcher=org.apache.flink.shaded.org.apache.curator.ConnectionState@57bd802b 2017-09-11 08:17:11,927 ERROR org.apache.flink.yarn.YarnApplicationMasterRunner - YARN Application Master initialization failed java.net.UnknownHostException: high-availability.zookeeper.quorum: 10.200.0.6: Name or service not known at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928) at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323) at java.net.InetAddress.getAllByName0(InetAddress.java:1276) at java.net.InetAddress.getAllByName(InetAddress.java:1192) at java.net.InetAddress.getAllByName(InetAddress.java:1126) at org.apache.zookeeper.client.StaticHostProvider.<init>(StaticHostProvider.java:61) any help in figuring this out will be appreciated