Hi all, We've got a jar with hadoop configuration files in it. Previously we use blocking mode to deploy jars on YARN, they run well. Recently we find the client process occupies more and more memory , so we try to use detached mode, but the job failed to deploy with following error information:
The program finished with the following exception: org.apache.flink.client.deployment.ClusterDeploymentException: Could not deploy Yarn job cluster. at org.apache.flink.yarn.YarnClusterDescriptor.deployJobCluster(YarnClusterDescriptor.java:82) at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:230) at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:205) at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1010) at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1083) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754) at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1083) Caused by: org.apache.flink.yarn.AbstractYarnClusterDescriptor$YarnDeploymentException: The YARN application unexpectedly switched to state FAILED during deployment. Diagnostics from YARN: Application application_1533815330295_30183 failed 2 times due to AM Container for appattempt_xxxx exited with exitCode: 1 For more detailed output, check application tracking page:http:xxxxThen, click on links to logs of each attempt. Diagnostics: Exception from container-launch. Container id: container_e05_xxxx Exit code: 1 Stack trace: ExitCodeException exitCode=1: at org.apache.hadoop.util.Shell.runCommand(Shell.java:593) at org.apache.hadoop.util.Shell.run(Shell.java:490) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:784) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:298) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:324) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Shell output: main : command provided 1 main : user is streams main : requested yarn user is user1 Then I found this email, http://mail-archives.apache.org/mod_mbox/flink-user/201901.mbox/<tencent_0301f26148ceee21005e9...@qq.com> , and set *yarn.per-job-cluster.include-user-jar: LAST*, then part of our jobs can be deployed as expected. But for some job need to operate another hdfs, with hadoop conf files in them, there's still problem. Job manager cannot resolve the hdfs domain name. I guess it's because the hadoop conf file in jar is loaded instead of the conf file in client hadoop dir. Is here someone can help?