Dominique: Which hadoop release are you using ? Please pastebin the classpath.
Cheers On Thu, May 11, 2017 at 7:27 AM, Till Rohrmann <trohrm...@apache.org> wrote: > Hi Dominique, > > I’m not exactly sure but this looks more like a Hadoop or a Hadoop > configuration problem to me. Could it be that the Hadoop version you’re > running does not support the specification of multiple KMS servers via > kms://ht...@lfrarxxx1.srv.company;lfrarXXX2.srv.company:16000/kms? > > Cheers, > Till > > > On Thu, May 11, 2017 at 4:06 PM, Dominique Rondé < > dominique.ro...@allsecur.de> wrote: > >> Dear all, >> >> i got some trouble during the start of Flink in a Yarn-Container based >> on Cloudera. I have a start script like that: >> >> slaxxxx:/applvg/home/flink/mvp $ cat run.sh >> export FLINK_HOME_DIR=/applvg/home/flink/mvp/flink-1.2.0/ >> export FLINK_JAR_DIR=/applvg/home/flink/mvp/cache >> export YARN_CONF_DIR=/etc/hadoop/conf >> export HADOOP_CONF_DIR=/etc/hadoop/conf >> >> >> /applvg/home/flink/mvp/flink-1.2.0/bin/yarn-session.sh -n 4 -s 3 -st -jm >> 2048 -tm 2048 -qu root.mr-spark.avp -d >> >> If I execute this script it looks like following: >> >> sla09037:/applvg/home/flink/mvp $ ./run.sh >> 2017-05-11 15:13:24,541 INFO >> org.apache.flink.configuration.GlobalConfiguration - Loading >> configuration property: jobmanager.rpc.address, localhost >> 2017-05-11 15:13:24,542 INFO >> org.apache.flink.configuration.GlobalConfiguration - Loading >> configuration property: jobmanager.rpc.port, 6123 >> 2017-05-11 15:13:24,542 INFO >> org.apache.flink.configuration.GlobalConfiguration - Loading >> configuration property: jobmanager.heap.mb, 256 >> 2017-05-11 15:13:24,543 INFO >> org.apache.flink.configuration.GlobalConfiguration - Loading >> configuration property: taskmanager.heap.mb, 512 >> 2017-05-11 15:13:24,543 INFO >> org.apache.flink.configuration.GlobalConfiguration - Loading >> configuration property: taskmanager.numberOfTaskSlots, 1 >> 2017-05-11 15:13:24,543 INFO >> org.apache.flink.configuration.GlobalConfiguration - Loading >> configuration property: taskmanager.memory.preallocate, false >> 2017-05-11 15:13:24,543 INFO >> org.apache.flink.configuration.GlobalConfiguration - Loading >> configuration property: parallelism.default, 1 >> 2017-05-11 15:13:24,543 INFO >> org.apache.flink.configuration.GlobalConfiguration - Loading >> configuration property: jobmanager.web.port, 8081 >> 2017-05-11 15:13:24,571 INFO >> org.apache.flink.configuration.GlobalConfiguration - Loading >> configuration property: jobmanager.rpc.address, localhost >> 2017-05-11 15:13:24,572 INFO >> org.apache.flink.configuration.GlobalConfiguration - Loading >> configuration property: jobmanager.rpc.port, 6123 >> 2017-05-11 15:13:24,572 INFO >> org.apache.flink.configuration.GlobalConfiguration - Loading >> configuration property: jobmanager.heap.mb, 256 >> 2017-05-11 15:13:24,572 INFO >> org.apache.flink.configuration.GlobalConfiguration - Loading >> configuration property: taskmanager.heap.mb, 512 >> 2017-05-11 15:13:24,572 INFO >> org.apache.flink.configuration.GlobalConfiguration - Loading >> configuration property: taskmanager.numberOfTaskSlots, 1 >> 2017-05-11 15:13:24,572 INFO >> org.apache.flink.configuration.GlobalConfiguration - Loading >> configuration property: taskmanager.memory.preallocate, false >> 2017-05-11 15:13:24,572 INFO >> org.apache.flink.configuration.GlobalConfiguration - Loading >> configuration property: parallelism.default, 1 >> 2017-05-11 15:13:24,572 INFO >> org.apache.flink.configuration.GlobalConfiguration - Loading >> configuration property: jobmanager.web.port, 8081 >> 2017-05-11 15:13:25,000 INFO >> org.apache.flink.runtime.security.modules.HadoopModule - Hadoop >> user set to fl...@companyde.rootdom.net (auth:KERBEROS) >> 2017-05-11 15:13:25,030 INFO >> org.apache.flink.configuration.GlobalConfiguration - Loading >> configuration property: jobmanager.rpc.address, localhost >> 2017-05-11 15:13:25,030 INFO >> org.apache.flink.configuration.GlobalConfiguration - Loading >> configuration property: jobmanager.rpc.port, 6123 >> 2017-05-11 15:13:25,030 INFO >> org.apache.flink.configuration.GlobalConfiguration - Loading >> configuration property: jobmanager.heap.mb, 256 >> 2017-05-11 15:13:25,030 INFO >> org.apache.flink.configuration.GlobalConfiguration - Loading >> configuration property: taskmanager.heap.mb, 512 >> 2017-05-11 15:13:25,031 INFO >> org.apache.flink.configuration.GlobalConfiguration - Loading >> configuration property: taskmanager.numberOfTaskSlots, 1 >> 2017-05-11 15:13:25,031 INFO >> org.apache.flink.configuration.GlobalConfiguration - Loading >> configuration property: taskmanager.memory.preallocate, false >> 2017-05-11 15:13:25,031 INFO >> org.apache.flink.configuration.GlobalConfiguration - Loading >> configuration property: parallelism.default, 1 >> 2017-05-11 15:13:25,031 INFO >> org.apache.flink.configuration.GlobalConfiguration - Loading >> configuration property: jobmanager.web.port, 8081 >> 2017-05-11 15:13:25,050 INFO >> org.apache.flink.yarn.YarnClusterDescriptor - Using >> values: >> 2017-05-11 15:13:25,051 INFO >> org.apache.flink.yarn.YarnClusterDescriptor - >> TaskManager count = 4 >> 2017-05-11 15:13:25,051 INFO >> org.apache.flink.yarn.YarnClusterDescriptor - >> JobManager memory = 2048 >> 2017-05-11 15:13:25,051 INFO >> org.apache.flink.yarn.YarnClusterDescriptor - >> TaskManager memory = 2048 >> 2017-05-11 15:13:25,903 WARN >> org.apache.hadoop.util.NativeCodeLoader - Unable >> to load native-hadoop library for your platform... using builtin-java >> classes where applicable >> 2017-05-11 15:13:25,962 WARN >> org.apache.flink.yarn.YarnClusterDescriptor - The >> configuration directory ('/applvg/home/flink/mvp/flink-1.2.0/conf') >> contains both LOG4J and Logback configuration files. Please delete or >> rename one of them. >> 2017-05-11 15:13:25,972 INFO >> org.apache.flink.yarn.Utils - Copying >> from file:/applvg/home/flink/mvp/flink-1.2.0/lib to >> hdfs://nameservice1/user/flink/.flink/application_1493762518335_0216/lib >> 2017-05-11 15:13:27,522 INFO >> org.apache.flink.yarn.Utils - Copying >> from file:/applvg/home/flink/mvp/flink-1.2.0/conf/log4j.properties to >> hdfs://nameservice1/user/flink/.flink/application_1493762518 >> 335_0216/log4j.properties >> 2017-05-11 15:13:27,552 INFO >> org.apache.flink.yarn.Utils - Copying >> from file:/applvg/home/flink/mvp/flink-1.2.0/conf/logback.xml to >> hdfs://nameservice1/user/flink/.flink/application_1493762518 >> 335_0216/logback.xml >> 2017-05-11 15:13:27,584 INFO >> org.apache.flink.yarn.Utils - Copying >> from >> file:/applvg/home/flink/mvp/flink-1.2.0/lib/flink-dist_2.11-1.2.0.jar to >> hdfs://nameservice1/user/flink/.flink/application_1493762518 >> 335_0216/flink-dist_2.11-1.2.0.jar >> 2017-05-11 15:13:28,508 INFO >> org.apache.flink.yarn.Utils - Copying >> from /applvg/home/flink/mvp/flink-1.2.0/conf/flink-conf.yaml to >> hdfs://nameservice1/user/flink/.flink/application_1493762518 >> 335_0216/flink-conf.yaml >> 2017-05-11 15:13:28,553 INFO >> org.apache.flink.yarn.YarnClusterDescriptor - Adding >> delegation token to the AM container.. >> 2017-05-11 15:13:28,563 INFO >> org.apache.hadoop.hdfs.DFSClient - Created >> HDFS_DELEGATION_TOKEN token 27247 for flink on ha-hdfs:nameservice1 >> Error while deploying YARN cluster: Couldn't deploy Yarn cluster >> java.lang.RuntimeException: Couldn't deploy Yarn cluster >> at >> org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploy(A >> bstractYarnClusterDescriptor.java:421) >> at >> org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnS >> essionCli.java:620) >> at >> org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call(FlinkYa >> rnSessionCli.java:476) >> at >> org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call(FlinkYa >> rnSessionCli.java:473) >> at >> org.apache.flink.runtime.security.HadoopSecurityContext$1. >> run(HadoopSecurityContext.java:43) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:422) >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGro >> upInformation.java:1656) >> at >> org.apache.flink.runtime.security.HadoopSecurityContext.runS >> ecured(HadoopSecurityContext.java:40) >> at >> org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarn >> SessionCli.java:473) >> Caused by: java.lang.IllegalArgumentException: >> java.net.UnknownHostException: lfrar256.srv.company;lfrar257.srv.company >> at >> org.apache.hadoop.security.SecurityUtil.buildTokenService(Se >> curityUtil.java:374) >> at >> org.apache.hadoop.crypto.key.kms.KMSClientProvider.getDelega >> tionTokenService(KMSClientProvider.java:823) >> at >> org.apache.hadoop.crypto.key.kms.KMSClientProvider.addDelega >> tionTokens(KMSClientProvider.java:779) >> at >> org.apache.hadoop.crypto.key.KeyProviderDelegationTokenExten >> sion.addDelegationTokens(KeyProviderDelegationTokenExtension.java:86) >> at >> org.apache.hadoop.hdfs.DistributedFileSystem.addDelegationTo >> kens(DistributedFileSystem.java:2046) >> at >> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokens >> ForNamenodesInternal(TokenCache.java:121) >> at >> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokens >> ForNamenodesInternal(TokenCache.java:100) >> at >> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokens >> ForNamenodes(TokenCache.java:80) >> at org.apache.flink.yarn.Utils.setTokensFor(Utils.java:154) >> at >> org.apache.flink.yarn.AbstractYarnClusterDescriptor.deployIn >> ternal(AbstractYarnClusterDescriptor.java:753) >> at >> org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploy(A >> bstractYarnClusterDescriptor.java:419) >> ... 9 more >> Caused by: java.net.UnknownHostException: >> lfrarXXX1.srv.company;lfrarXXX2.srv.company >> ... 20 more >> >> It seems that flink found these hosts here: >> slaxxxxx:/applvg/home/flink/mvp $ grep -r >> "lfrarXXX1.srv.company;lfrarXXX2.srv.company" /etc/hadoop/conf >> /etc/hadoop/conf/core-site.xml: >> <value>kms://ht...@lfrarxxx1.srv.company;lfrarXXX2.srv.compa >> ny:16000/kms</value> >> /etc/hadoop/conf/hdfs-site.xml: >> <value>kms://ht...@lfrarxxx1.srv.company;lfrarXXX2.srv.compa >> ny:16000/kms</value> >> >> So I guess that flink got this connectionstrings from the >> Cloudera-Config and "forget" to split it at the ";". So if i ping each >> of those everything is working. >> >> Maybe you have some hints to avoid this problem? >> >> Best wishes >> Dominiuqe >> >> >