Hi Sysuke,

Could you check the JM log (YARN AM container log) first?
You might find the direct failure message there.

Thanks,
Biao /'bɪ.aʊ/



On Fri, 17 Jan 2020 at 12:02, sysuke Lee <sysuke...@gmail.com> wrote:

> Hi all,
> We've got a jar with hadoop  configuration files in it.
>
> Previously we use blocking mode to deploy jars on YARN, they run well.
> Recently we find the client process occupies more and more memory , so we
> try to use detached mode, but the job failed to deploy with following error
> information:
>
> The program finished with the following exception:
>
> org.apache.flink.client.deployment.ClusterDeploymentException: Could not 
> deploy Yarn job cluster.
>         at 
> org.apache.flink.yarn.YarnClusterDescriptor.deployJobCluster(YarnClusterDescriptor.java:82)
>         at 
> org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:230)
>         at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:205)
>         at 
> org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1010)
>         at 
> org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1083)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754)
>         at 
> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>         at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1083)
> Caused by: 
> org.apache.flink.yarn.AbstractYarnClusterDescriptor$YarnDeploymentException: 
> The YARN application unexpectedly switched to state FAILED during deployment.
> Diagnostics from YARN: Application application_1533815330295_30183 failed 2 
> times due to AM Container for appattempt_xxxx exited with  exitCode: 1
> For more detailed output, check application tracking page:http:xxxxThen, 
> click on links to logs of each attempt.
> Diagnostics: Exception from container-launch.
> Container id: container_e05_xxxx
> Exit code: 1
> Stack trace: ExitCodeException exitCode=1:
>         at org.apache.hadoop.util.Shell.runCommand(Shell.java:593)
>         at org.apache.hadoop.util.Shell.run(Shell.java:490)
>         at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:784)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:298)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:324)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
>
> Shell output: main : command provided 1
> main : user is streams
> main : requested yarn user is user1
>
>
> Then I found this email,
> http://mail-archives.apache.org/mod_mbox/flink-user/201901.mbox/<tencent_0301f26148ceee21005e9...@qq.com>
> , and set *yarn.per-job-cluster.include-user-jar: LAST*,  then part of
> our jobs can be deployed as expected.
>
> But for some job need to operate another hdfs, with hadoop conf files in
> them, there's still problem. Job manager cannot resolve the hdfs domain
> name. I guess it's because the hadoop conf file in jar is loaded instead of
> the conf file in client hadoop  dir.
>
> Is here someone can help?
>

Reply via email to