Ah, thanks Yang for the fixup. I misunderstood the original answer.

Thanks,
Biao /'bɪ.aʊ/



On Fri, 17 Jan 2020 at 16:39, Yang Wang <danrtsey...@gmail.com> wrote:

> Hi sysuke,
>
> >> Why the Yarn per-job attach mode could work, but detach mode could not?
> It is just becausein 1.9 and previous versions, the per-job have very
> different code path for attach and detach
> mode. For attach mode, Flink client will start a session cluster, and then
> submit a job to the existing session.
> So all the users jars are loaded by user classloader, not system
> classloader. For detach mode, all the jars will
> be shipped by Yarn local resources and appended to the system classpath of
> jobmanager and taskmanager.
> The behavior will be changed from 1.10. Both detach and attach will always
> be the real per-job, not simulate
> by session. You could check FLIP-82 for more information[1].
>
> >> How to fix this problem?
> 1. If you Yarn cluster could support multiple hdfs clusters, then you will
> not need to add hdfs configuration in
> you jar. That's how we use it in production environment.
> 2. If you can not change this, and you will use Flink 1.10. Then you could
> set
> `yarn.per-job-cluster.include-user-jar: DISABLED`. Then all the user jars
> will not be added to system classpath.
> Instead, they will be loaded by user classloader. This is a new feature in
> 1.10. Check more information here[2].
> 3. If you are still using the 1.9 and previous versions, move the hdfs
> configuration out of your jar. Then use `-t`
> to ship your hadoop configuration and reset the hadoop env.
> -yt /path/of/my-hadoop-conf
> -yD containerized.master.env.HADOOP_CONF_DIR='$PWD/my-hadoop-conf'
> -yD containerized.taskmanager.env.HADOOP_CONF_DIR='$PWD/my-hadoop-conf'
>
>
> [1].
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-82%3A+Use+real+per-job+mode+for+YARN+per-job+attached+execution
> [2].https://issues.apache.org/jira/browse/FLINK-13993
>
>
> Best,
> Yang
>
> sysuke Lee <sysuke...@gmail.com> 于2020年1月17日周五 下午12:02写道:
>
>> Hi all,
>> We've got a jar with hadoop  configuration files in it.
>>
>> Previously we use blocking mode to deploy jars on YARN, they run well.
>> Recently we find the client process occupies more and more memory , so we
>> try to use detached mode, but the job failed to deploy with following error
>> information:
>>
>> The program finished with the following exception:
>>
>> org.apache.flink.client.deployment.ClusterDeploymentException: Could not 
>> deploy Yarn job cluster.
>>         at 
>> org.apache.flink.yarn.YarnClusterDescriptor.deployJobCluster(YarnClusterDescriptor.java:82)
>>         at 
>> org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:230)
>>         at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:205)
>>         at 
>> org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1010)
>>         at 
>> org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1083)
>>         at java.security.AccessController.doPrivileged(Native Method)
>>         at javax.security.auth.Subject.doAs(Subject.java:422)
>>         at 
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754)
>>         at 
>> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>>         at 
>> org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1083)
>> Caused by: 
>> org.apache.flink.yarn.AbstractYarnClusterDescriptor$YarnDeploymentException: 
>> The YARN application unexpectedly switched to state FAILED during deployment.
>> Diagnostics from YARN: Application application_1533815330295_30183 failed 2 
>> times due to AM Container for appattempt_xxxx exited with  exitCode: 1
>> For more detailed output, check application tracking page:http:xxxxThen, 
>> click on links to logs of each attempt.
>> Diagnostics: Exception from container-launch.
>> Container id: container_e05_xxxx
>> Exit code: 1
>> Stack trace: ExitCodeException exitCode=1:
>>         at org.apache.hadoop.util.Shell.runCommand(Shell.java:593)
>>         at org.apache.hadoop.util.Shell.run(Shell.java:490)
>>         at 
>> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:784)
>>         at 
>> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:298)
>>         at 
>> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:324)
>>         at 
>> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
>>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>         at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>         at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>         at java.lang.Thread.run(Thread.java:745)
>>
>> Shell output: main : command provided 1
>> main : user is streams
>> main : requested yarn user is user1
>>
>>
>> Then I found this email,
>> http://mail-archives.apache.org/mod_mbox/flink-user/201901.mbox/<tencent_0301f26148ceee21005e9...@qq.com>
>> , and set *yarn.per-job-cluster.include-user-jar: LAST*,  then part of
>> our jobs can be deployed as expected.
>>
>> But for some job need to operate another hdfs, with hadoop conf files in
>> them, there's still problem. Job manager cannot resolve the hdfs domain
>> name. I guess it's because the hadoop conf file in jar is loaded instead of
>> the conf file in client hadoop  dir.
>>
>> Is here someone can help?
>>
>

Reply via email to