Hi Sysuke, Could you check the JM log (YARN AM container log) first? You might find the direct failure message there.
Thanks, Biao /'bɪ.aʊ/ On Fri, 17 Jan 2020 at 12:02, sysuke Lee <sysuke...@gmail.com> wrote: > Hi all, > We've got a jar with hadoop configuration files in it. > > Previously we use blocking mode to deploy jars on YARN, they run well. > Recently we find the client process occupies more and more memory , so we > try to use detached mode, but the job failed to deploy with following error > information: > > The program finished with the following exception: > > org.apache.flink.client.deployment.ClusterDeploymentException: Could not > deploy Yarn job cluster. > at > org.apache.flink.yarn.YarnClusterDescriptor.deployJobCluster(YarnClusterDescriptor.java:82) > at > org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:230) > at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:205) > at > org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1010) > at > org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1083) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754) > at > org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) > at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1083) > Caused by: > org.apache.flink.yarn.AbstractYarnClusterDescriptor$YarnDeploymentException: > The YARN application unexpectedly switched to state FAILED during deployment. > Diagnostics from YARN: Application application_1533815330295_30183 failed 2 > times due to AM Container for appattempt_xxxx exited with exitCode: 1 > For more detailed output, check application tracking page:http:xxxxThen, > click on links to logs of each attempt. > Diagnostics: Exception from container-launch. > Container id: container_e05_xxxx > Exit code: 1 > Stack trace: ExitCodeException exitCode=1: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:593) > at org.apache.hadoop.util.Shell.run(Shell.java:490) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:784) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:298) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:324) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > > Shell output: main : command provided 1 > main : user is streams > main : requested yarn user is user1 > > > Then I found this email, > http://mail-archives.apache.org/mod_mbox/flink-user/201901.mbox/<tencent_0301f26148ceee21005e9...@qq.com> > , and set *yarn.per-job-cluster.include-user-jar: LAST*, then part of > our jobs can be deployed as expected. > > But for some job need to operate another hdfs, with hadoop conf files in > them, there's still problem. Job manager cannot resolve the hdfs domain > name. I guess it's because the hadoop conf file in jar is loaded instead of > the conf file in client hadoop dir. > > Is here someone can help? >