Hi,
         I’m trying to run flink on yarn by using command: bin/flink run -m 
yarn-cluster -yn 2 -ys 4 ./examples/batch/WordCount.jar
         But I got the following error:


2017-02-17 15:52:40,746 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli     
            - No path for the flink jar passed. Using the location of class 
org.apache.flink.yarn.YarnClusterDescriptor to locate the jar

2017-02-17 15:52:40,746 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli     
            - No path for the flink jar passed. Using the location of class 
org.apache.flink.yarn.YarnClusterDescriptor to locate the jar

2017-02-17 15:52:40,775 INFO  org.apache.flink.yarn.YarnClusterDescriptor       
            - Using values:

2017-02-17 15:52:40,775 INFO  org.apache.flink.yarn.YarnClusterDescriptor       
            -         TaskManager count = 2

2017-02-17 15:52:40,775 INFO  org.apache.flink.yarn.YarnClusterDescriptor       
            -         JobManager memory = 1024

2017-02-17 15:52:40,775 INFO  org.apache.flink.yarn.YarnClusterDescriptor       
            -         TaskManager memory = 1024

2017-02-17 15:52:40,796 INFO  org.apache.hadoop.yarn.client.RMProxy             
            - Connecting to ResourceManager at /0.0.0.0:8032

2017-02-17 15:52:41,680 WARN  org.apache.flink.yarn.YarnClusterDescriptor       
            - The configuration directory ('/home/software/flink-1.1.4/conf') 
contains both LOG4J and Logback configuration files. Please delete or rename 
one of them.

2017-02-17 15:52:41,702 INFO  org.apache.flink.yarn.Utils                       
            - Copying from file:/home/software/flink-1.1.4/conf/logback.xml to 
hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0017/logback.xml

2017-02-17 15:52:42,025 INFO  org.apache.flink.yarn.Utils                       
            - Copying from file:/home/software/flink-1.1.4/lib to 
hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0017/lib

2017-02-17 15:52:42,695 INFO  org.apache.flink.yarn.Utils                       
            - Copying from 
file:/home/software/flink-1.1.4/conf/log4j.properties to 
hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0017/log4j.properties

2017-02-17 15:52:42,722 INFO  org.apache.flink.yarn.Utils                       
            - Copying from 
file:/home/software/flink-1.1.4/lib/flink-dist_2.10-1.1.4.jar to 
hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0017/flink-dist_2.10-1.1.4.jar

2017-02-17 15:52:43,346 INFO  org.apache.flink.yarn.Utils                       
            - Copying from /home/software/flink-1.1.4/conf/flink-conf.yaml to 
hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0017/flink-conf.yaml

2017-02-17 15:52:43,386 INFO  org.apache.flink.yarn.YarnClusterDescriptor       
            - Submitting application master application_1487247313588_0017

2017-02-17 15:52:43,425 INFO  
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl         - Submitted 
application application_1487247313588_0017

2017-02-17 15:52:43,425 INFO  org.apache.flink.yarn.YarnClusterDescriptor       
            - Waiting for the cluster to be allocated

2017-02-17 15:52:43,427 INFO  org.apache.flink.yarn.YarnClusterDescriptor       
            - Deploying cluster, current state ACCEPTED

2017-02-17 15:52:48,471 INFO  org.apache.flink.yarn.YarnClusterDescriptor       
            - YARN application has been deployed successfully.

Cluster started: Yarn cluster with application id application_1487247313588_0017

Using address 10.199.202.162:43809 to connect to JobManager.

JobManager web interface address 
http://vip-rc-ucsww.vclound.com:8088/proxy/application_1487247313588_0017/

Using the parallelism provided by the remote cluster (8). To use another 
parallelism, set it at the ./bin/flink client.

Starting execution of program

2017-02-17 15:52:49,278 INFO  org.apache.flink.yarn.YarnClusterClient           
            - Starting program in interactive mode

Executing WordCount example with default input data set.

Use --input to specify file input.

Printing result to stdout. Use --output to specify output path.

2017-02-17 15:52:49,609 INFO  org.apache.flink.yarn.YarnClusterClient           
            - Waiting until all TaskManagers have connected

Waiting until all TaskManagers have connected

2017-02-17 15:52:49,610 INFO  org.apache.flink.yarn.YarnClusterClient           
            - Starting client actor system.



------------------------------------------------------------

The program finished with the following exception:



org.apache.flink.client.program.ProgramInvocationException: The main method 
caused an error.

     at 
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:525)

     at 
org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:404)

     at 
org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:321)

     at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:777)

     at org.apache.flink.client.CliFrontend.run(CliFrontend.java:253)

     at 
org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1005)

     at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1048)

Caused by: java.lang.RuntimeException: Unable to get ClusterClient status from 
Application Client

     at 
org.apache.flink.yarn.YarnClusterClient.getClusterStatus(YarnClusterClient.java:242)

     at 
org.apache.flink.yarn.YarnClusterClient.waitForClusterToBeReady(YarnClusterClient.java:514)

     at 
org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:395)

     at 
org.apache.flink.yarn.YarnClusterClient.submitJob(YarnClusterClient.java:204)

     at 
org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:383)

     at 
org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:370)

     at 
org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:62)

     at 
org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:896)

     at org.apache.flink.api.java.DataSet.collect(DataSet.java:410)

     at org.apache.flink.api.java.DataSet.print(DataSet.java:1605)

     at 
org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:92)

     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

     at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

     at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

     at java.lang.reflect.Method.invoke(Method.java:498)

     at 
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:510)

     ... 6 more

Caused by: org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: 
Could not retrieve the leader gateway

     at 
org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderGateway(LeaderRetrievalUtils.java:127)

     at 
org.apache.flink.client.program.ClusterClient.getJobManagerGateway(ClusterClient.java:645)

     at 
org.apache.flink.yarn.YarnClusterClient.getClusterStatus(YarnClusterClient.java:237)

     ... 21 more

Caused by: java.util.concurrent.TimeoutException: Futures timed out after 
[10000 milliseconds]

     at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)

     at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)

     at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)

     at 
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)

     at scala.concurrent.Await$.result(package.scala:107)

     at scala.concurrent.Await.result(package.scala)

     at 
org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderGateway(LeaderRetrievalUtils.java:125)

     ... 23 more

2017-02-17 15:53:20,084 INFO  org.apache.flink.yarn.YarnClusterClient           
            - Sending shutdown request to the Application Master

2017-02-17 15:53:20,085 INFO  org.apache.flink.yarn.YarnClusterClient           
            - Start application client.

2017-02-17 15:53:20,088 WARN  org.apache.flink.yarn.YarnClusterClient           
            - YARN reported application state FAILED

2017-02-17 15:53:20,089 WARN  org.apache.flink.yarn.YarnClusterClient           
            - Diagnostics: Application application_1487247313588_0017 failed 1 
times due to AM Container for appattempt_1487247313588_0017_000001 exited with  
exitCode: -103

For more detailed output, check application tracking 
page:http://vip-rc-ucsww.vclound.com:8088/cluster/app/application_1487247313588_0017Then,
 click on links to logs of each attempt.

Diagnostics: Container 
[pid=18733,containerID=container_1487247313588_0017_01_000001] is running 
beyond virtual memory limits. Current usage: 264.7 MB of 1 GB physical memory 
used; 2.2 GB of 2.1 GB virtual memory used. Killing container.

Dump of the process-tree for container_1487247313588_0017_01_000001 :

     |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) 
SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE

     |- 18740 18733 18733 18733 (java) 955 64 2298933248 67430 
/home/software/jdk1.8.0_111/bin/java -Xmx424M 
-Dlog.file=/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0017/container_1487247313588_0017_01_000001/jobmanager.log
 -Dlogback.configurationFile=file:logback.xml 
-Dlog4j.configuration=file:log4j.properties 
org.apache.flink.yarn.YarnApplicationMasterRunner

     |- 18733 18731 18733 18733 (bash) 0 0 108605440 334 /bin/bash -c 
/home/software/jdk1.8.0_111/bin/java -Xmx424M  
-Dlog.file=/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0017/container_1487247313588_0017_01_000001/jobmanager.log
 -Dlogback.configurationFile=file:logback.xml 
-Dlog4j.configuration=file:log4j.properties 
org.apache.flink.yarn.YarnApplicationMasterRunner  
1>/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0017/container_1487247313588_0017_01_000001/jobmanager.out
 
2>/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0017/container_1487247313588_0017_01_000001/jobmanager.err



Container killed on request. Exit code is 143

Container exited with a non-zero exit code 143

Failing this attempt. Failing the application.

2017-02-17 15:53:20,102 INFO  org.apache.flink.yarn.ApplicationClient           
            - Notification about new leader address 
akka.tcp://flink@10.199.202.162:43809/user/jobmanager with session ID null.

2017-02-17 15:53:20,106 INFO  org.apache.flink.yarn.ApplicationClient           
            - Sending StopCluster request to JobManager.

2017-02-17 15:53:20,107 INFO  org.apache.flink.yarn.ApplicationClient           
            - Received address of new leader 
akka.tcp://flink@10.199.202.162:43809/user/jobmanager with session ID null.

2017-02-17 15:53:20,108 INFO  org.apache.flink.yarn.ApplicationClient           
            - Disconnect from JobManager null.

2017-02-17 15:53:20,112 INFO  org.apache.flink.yarn.ApplicationClient           
            - Trying to register at JobManager 
akka.tcp://flink@10.199.202.162:43809/user/jobmanager.

Listening for transport dt_socket at address: 5006

2017-02-17 15:53:20,624 INFO  org.apache.flink.yarn.ApplicationClient           
            - Trying to register at JobManager 
akka.tcp://flink@10.199.202.162:43809/user/jobmanager.

2017-02-17 15:53:21,124 INFO  org.apache.flink.yarn.ApplicationClient           
            - Sending StopCluster request to JobManager.

2017-02-17 15:53:21,645 INFO  org.apache.flink.yarn.ApplicationClient           
            - Trying to register at JobManager 
akka.tcp://flink@10.199.202.162:43809/user/jobmanager.

2017-02-17 15:53:22,145 INFO  org.apache.flink.yarn.ApplicationClient           
            - Sending StopCluster request to JobManager.

2017-02-17 15:53:23,165 INFO  org.apache.flink.yarn.ApplicationClient           
            - Sending StopCluster request to JobManager.

2017-02-17 15:53:23,664 INFO  org.apache.flink.yarn.ApplicationClient           
            - Trying to register at JobManager 
akka.tcp://flink@10.199.202.162:43809/user/jobmanager.

2017-02-17 15:53:24,185 INFO  org.apache.flink.yarn.ApplicationClient           
            - Sending StopCluster request to JobManager.

2017-02-17 15:53:25,204 INFO  org.apache.flink.yarn.ApplicationClient           
            - Sending StopCluster request to JobManager.

The main error is : 
org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Could not 
retrieve the leader gateway。May be It have some relationship with 
https://issues.apache.org/jira/browse/FLINK-2821. It is said that IP will 
always take place in akka address, but not hostnames. But I find hostname in 
akka address in leaderRetrievalService.

This problem won’t appear in 1.1.4.

Thank you all.

Howard
本电子邮件可能为保密文件。如果阁下非电子邮件所指定之收件人,谨请立即通知本人。敬请阁下不要使用、保存、复印、打印、散布本电子邮件及其内容,或将其用于其他任何目的或向任何人披露。谢谢您的合作!
 This communication is intended only for the addressee(s) and may contain 
information that is privileged and confidential. You are hereby notified that, 
if you are not an intended recipient listed above, or an authorized employee or 
agent of an addressee of this communication responsible for delivering e-mail 
messages to an intended recipient, any dissemination, distribution or 
reproduction of this communication (including any attachments hereto) is 
strictly prohibited. If you have received this communication in error, please 
notify us immediately by a reply e-mail addressed to the sender and permanently 
delete the original e-mail communication and any attachments from all storage 
devices without making or otherwise retaining a copy.

Reply via email to