Hi Till, The "9-96-101-177" is just the hostname. I rerun the flink on yarn and here is the jobmanager.log, and sorry for I blocked some sensitive log. By the way , the port(another questions in my earlier mail) seems to be a random value(this time is 38785 and 35699). I used flink-1.2.0, it works well on the yarn which is provided by apache open source. But I run flink on a special yarn which was token some safety reinforcement based on apache-yarn. I just have no idea about the ERROR log.
Jobmanage.log: 2017-01-14 09:24:35,584 INFO org.apache.flink.yarn.YarnApplicationMasterRunner - -------------------------------------------------------------------------------- 2017-01-14 09:24:35,585 INFO org.apache.flink.yarn.YarnApplicationMasterRunner - Starting YARN ApplicationMaster / ResourceManager / JobManager (Version: 1.2.0, Rev:82b1079, Date:04.01.2017 @ 17:38:23 CST) 2017-01-14 09:24:35,585 INFO org.apache.flink.yarn.YarnApplicationMasterRunner - Current user: admin 2017-01-14 09:24:35,585 INFO org.apache.flink.yarn.YarnApplicationMasterRunner - JVM: Java HotSpot(TM) 64-Bit Server VM - Oracle Corporation - 1.8/25.112-b15 2017-01-14 09:24:35,585 INFO org.apache.flink.yarn.YarnApplicationMasterRunner - Maximum heap size: 406 MiBytes 2017-01-14 09:24:35,585 INFO org.apache.flink.yarn.YarnApplicationMasterRunner - JAVA_HOME: /opt/huawei/Bigdata/jdk1.8.0_112/ 2017-01-14 09:24:35,587 INFO org.apache.flink.yarn.YarnApplicationMasterRunner - Hadoop version: 2.7.2 2017-01-14 09:24:35,587 INFO org.apache.flink.yarn.YarnApplicationMasterRunner - JVM Options: 2017-01-14 09:24:35,587 INFO org.apache.flink.yarn.YarnApplicationMasterRunner - -Xmx424M 2017-01-14 09:24:35,587 INFO org.apache.flink.yarn.YarnApplicationMasterRunner - -Dlog.file=/srv/BigData/hadoop/data1/nm/containerlogs/application_1483499303549_0043/container_1483499303549_0043_01_000001/jobmanager.log 2017-01-14 09:24:35,587 INFO org.apache.flink.yarn.YarnApplicationMasterRunner - -Dlogback.configurationFile=file:logback.xml 2017-01-14 09:24:35,587 INFO org.apache.flink.yarn.YarnApplicationMasterRunner - -Dlog4j.configuration=file:log4j.properties 2017-01-14 09:24:35,587 INFO org.apache.flink.yarn.YarnApplicationMasterRunner - Program Arguments: (none) 2017-01-14 09:24:35,589 INFO org.apache.flink.yarn.YarnApplicationMasterRunner - -------------------------------------------------------------------------------- 2017-01-14 09:24:35,589 INFO org.apache.flink.yarn.YarnApplicationMasterRunner - Registered UNIX signal handlers for [TERM, HUP, INT] 2017-01-14 09:24:35,591 INFO org.apache.flink.yarn.YarnApplicationMasterRunner - remoteKeytabPrincipal obtained admin 2017-01-14 09:24:35,592 INFO org.apache.flink.yarn.YarnApplicationMasterRunner - YARN daemon is running as: admin Yarn client user obtainer: ad...@hadoop.com 2017-01-14 09:24:35,596 INFO org.apache.flink.yarn.YarnApplicationMasterRunner - Loading config from directory /srv/BigData/hadoop/data1/nm/localdir/usercache/admin/appcache/application_1483499303549_0043/container_1483499303549_0043_01_000001 2017-01-14 09:24:35,598 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, 9.96.101.32 2017-01-14 09:24:35,598 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123 2017-01-14 09:24:35,598 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.mb, 256 2017-01-14 09:24:35,598 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.mb, 512 2017-01-14 09:24:35,598 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 1 2017-01-14 09:24:35,598 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.memory.preallocate, false 2017-01-14 09:24:35,599 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 1 2017-01-14 09:24:35,599 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.web.port, 8081 2017-01-14 09:24:35,599 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: security.keytab, /home/demo/flink/release/flink-1.2.0/keytab/user.keytab 2017-01-14 09:24:35,599 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: security.principal, admin 2017-01-14 09:24:35,608 INFO org.apache.flink.runtime.security.JaasConfiguration - Initializing JAAS configuration instance. Parameters: /srv/BigData/hadoop/data1/nm/localdir/usercache/admin/appcache/application_1483499303549_0043/container_1483499303549_0043_01_000001/krb5.keytab, admin 2017-01-14 09:24:35,609 INFO org.apache.flink.runtime.security.SecurityUtils - SASL client auth for ZK will be disabled 2017-01-14 09:24:35,824 INFO org.apache.hadoop.security.UserGroupInformation - Login successful for user admin using keytab file /srv/BigData/hadoop/data1/nm/localdir/usercache/admin/appcache/application_1483499303549_0043/container_1483499303549_0043_01_000001/krb5.keytab 2017-01-14 09:24:35,825 INFO org.apache.flink.runtime.security.SecurityUtils - Hadoop user set to ad...@hadoop.com (auth:KERBEROS) 2017-01-14 09:24:35,936 INFO org.apache.flink.yarn.YarnApplicationMasterRunner - YARN assigned hostname for application master: 9-96-101-251 2017-01-14 09:24:35,936 INFO org.apache.flink.yarn.YarnApplicationMasterRunner - keytabPath: /srv/BigData/hadoop/data1/nm/localdir/usercache/admin/appcache/application_1483499303549_0043/container_1483499303549_0043_01_000001/krb5.keytab 2017-01-14 09:24:35,938 INFO org.apache.flink.yarn.YarnApplicationMasterRunner - TaskManagers will be created with 1 task slots 2017-01-14 09:24:35,938 INFO org.apache.flink.yarn.YarnApplicationMasterRunner - TaskManagers will be started with container size 1024 MB, JVM heap size 424 MB, JVM direct memory limit 424 MB 2017-01-14 09:24:35,943 INFO org.apache.flink.yarn.YarnApplicationMasterRunner - Trying to start actor system at 9.96.101.251:38785 2017-01-14 09:24:36,336 INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started 2017-01-14 09:24:36,438 INFO Remoting - Starting remoting 2017-01-14 09:24:36,547 INFO Remoting - Remoting started; listening on addresses :[akka.tcp://flink@9-96-101-251:38785] 2017-01-14 09:24:36,551 INFO org.apache.flink.yarn.YarnApplicationMasterRunner - Actor system started at 9.96.101.251:38785 2017-01-14 09:24:36,551 INFO org.apache.flink.yarn.YarnApplicationMasterRunner - Actor system bound to hostname 9-96-101-251. 2017-01-14 09:24:36,554 INFO org.apache.flink.yarn.YarnApplicationMasterRunner - Setting up resources for TaskManagers 2017-01-14 09:24:36,554 INFO org.apache.flink.yarn.YarnApplicationMasterRunner - TM:remoteKeytabPath obtained hdfs://hacluster/user/admin/.flink/application_1483499303549_0043/user.keytab 2017-01-14 09:24:36,555 INFO org.apache.flink.yarn.YarnApplicationMasterRunner - TM:remoteKeytabPrincipal obtained admin 2017-01-14 09:24:36,555 INFO org.apache.flink.yarn.YarnApplicationMasterRunner - TM:remoteYarnConfPath obtained null 2017-01-14 09:24:36,555 INFO org.apache.flink.yarn.YarnApplicationMasterRunner - TM:remotekrb5Path obtained null 2017-01-14 09:24:36,932 WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2017-01-14 09:24:36,945 WARN org.apache.hadoop.hdfs.shortcircuit.DomainSocketFactory - The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. 2017-01-14 09:24:36,949 INFO org.apache.flink.yarn.YarnApplicationMasterRunner - Adding keytab hdfs://hacluster/user/admin/.flink/application_1483499303549_0043/user.keytab to the AM container local resource bucket 2017-01-14 09:24:37,085 INFO org.apache.flink.yarn.Utils - Copying from file:/srv/BigData/hadoop/data1/nm/localdir/usercache/admin/appcache/application_1483499303549_0043/container_1483499303549_0043_01_000001/e2943789-d80a-4abd-8ae9-2fc14cb1fd03-taskmanager-conf.yaml to hdfs://hacluster/user/admin/.flink/application_1483499303549_0043/e2943789-d80a-4abd-8ae9-2fc14cb1fd03-taskmanager-conf.yaml 2017-01-14 09:24:37,258 INFO org.apache.flink.yarn.YarnApplicationMasterRunner - Prepared local resource for modified yaml: resource { scheme: "hdfs" host: "hacluster" port: -1 file: "/user/admin/.flink/application_1483499303549_0043/e2943789-d80a-4abd-8ae9-2fc14cb1fd03-taskmanager-conf.yaml" } size: 878 timestamp: 1484357077250 type: FILE visibility: APPLICATION 2017-01-14 09:24:37,265 INFO org.apache.flink.yarn.YarnApplicationMasterRunner - Creating container launch context for TaskManagers 2017-01-14 09:24:37,265 INFO org.apache.flink.yarn.YarnApplicationMasterRunner - Starting TaskManagers with command: $JAVA_HOME/bin/java -Xms424m -Xmx424m -XX:MaxDirectMemorySize=424m -Dlog.file=<LOG_DIR>/taskmanager.log -Dlogback.configurationFile=file:./logback.xml -Dlog4j.configuration=file:./log4j.properties org.apache.flink.yarn.YarnTaskManager --configDir . 1> <LOG_DIR>/taskmanager.out 2> <LOG_DIR>/taskmanager.err 2017-01-14 09:24:37,288 INFO org.apache.flink.runtime.blob.BlobServer - Created BLOB server storage directory /tmp/blobStore-f91bad88-1473-4e86-b151-f93dffa58baa 2017-01-14 09:24:37,289 INFO org.apache.flink.runtime.blob.BlobServer - Started BLOB server at 0.0.0.0:53972 - max concurrent requests: 50 - max backlog: 1000 2017-01-14 09:24:37,302 INFO org.apache.flink.runtime.metrics.MetricRegistry - No metrics reporter configured, no metrics will be exposed/reported. 2017-01-14 09:24:37,307 INFO org.apache.flink.yarn.YarnApplicationMasterRunner - Starting JobManager Web Frontend 2017-01-14 09:24:37,310 INFO org.apache.flink.runtime.jobmanager.MemoryArchivist - Started memory archivist akka://flink/user/$a 2017-01-14 09:24:37,311 INFO org.apache.flink.yarn.YarnJobManager - Starting JobManager at akka.tcp://flink@9-96-101-251:38785/user/jobmanager. 2017-01-14 09:24:37,318 INFO org.apache.flink.runtime.webmonitor.WebMonitorUtils - Determined location of JobManager log file: /srv/BigData/hadoop/data1/nm/containerlogs/application_1483499303549_0043/container_1483499303549_0043_01_000001/jobmanager.log 2017-01-14 09:24:37,318 INFO org.apache.flink.runtime.webmonitor.WebMonitorUtils - Determined location of JobManager stdout file: /srv/BigData/hadoop/data1/nm/containerlogs/application_1483499303549_0043/container_1483499303549_0043_01_000001/jobmanager.out 2017-01-14 09:24:37,318 INFO org.apache.flink.runtime.webmonitor.WebRuntimeMonitor - Using directory /tmp/flink-web-c4991b46-e637-4207-80ed-caef4cf5702e for the web interface files 2017-01-14 09:24:37,364 INFO org.apache.flink.runtime.webmonitor.WebRuntimeMonitor - Using directory /tmp/flink-web-f8b7b6d7-b066-48a1-9536-40a3cdf42778 for web frontend JAR file uploads 2017-01-14 09:24:37,378 INFO org.apache.flink.yarn.YarnJobManager - JobManager akka.tcp://flink@9-96-101-251:38785/user/jobmanager was granted leadership with leader session ID None. 2017-01-14 09:24:37,560 INFO org.apache.flink.runtime.webmonitor.WebRuntimeMonitor - Web frontend listening at 0:0:0:0:0:0:0:0:35699 2017-01-14 09:24:37,561 INFO org.apache.flink.runtime.webmonitor.WebRuntimeMonitor - Starting with JobManager akka.tcp://flink@9-96-101-251:38785/user/jobmanager on port 35699 2017-01-14 09:24:37,561 INFO org.apache.flink.runtime.webmonitor.JobManagerRetriever - New leader reachable under akka://flink/user/jobmanager#-640052308:null. 2017-01-14 09:24:37,568 INFO org.apache.flink.yarn.YarnApplicationMasterRunner - YARN application tolerates 4 failed TaskManager containers before giving up 2017-01-14 09:24:37,571 INFO org.apache.flink.yarn.YarnApplicationMasterRunner - YARN Application Master started 2017-01-14 09:24:37,579 INFO org.apache.flink.yarn.YarnFlinkResourceManager - Initializing YARN resource master 2017-01-14 09:24:37,605 INFO org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy - yarn.client.max-cached-nodemanagers-proxies : 0 2017-01-14 09:24:37,606 INFO org.apache.flink.yarn.YarnFlinkResourceManager - Registering Application Master with tracking url http://9-96-101-251:35699 2017-01-14 09:24:37,641 INFO org.apache.flink.yarn.YarnFlinkResourceManager - Trying to associate with JobManager leader akka://flink/user/jobmanager#-640052308 2017-01-14 09:24:37,647 INFO org.apache.flink.yarn.YarnFlinkResourceManager - Resource Manager associating with leading JobManager Actor[akka://flink/user/jobmanager#-640052308] - leader session null 2017-01-14 09:24:37,648 INFO org.apache.flink.yarn.YarnFlinkResourceManager - Requesting new TaskManager container with 1024 megabytes memory. Pending requests: 1 2017-01-14 09:24:37,654 INFO org.apache.flink.yarn.YarnFlinkResourceManager - Requesting new TaskManager container with 1024 megabytes memory. Pending requests: 2 2017-01-14 09:24:37,654 INFO org.apache.flink.yarn.YarnFlinkResourceManager - Requesting new TaskManager container with 1024 megabytes memory. Pending requests: 3 2017-01-14 09:24:37,655 INFO org.apache.flink.yarn.YarnFlinkResourceManager - Requesting new TaskManager container with 1024 megabytes memory. Pending requests: 4 2017-01-14 09:24:38,681 INFO org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl - Received new token for : 9-96-101-177:26009 2017-01-14 09:24:38,692 INFO org.apache.flink.yarn.YarnFlinkResourceManager - Received new container: container_1483499303549_0043_01_000002 - Remaining pending container requests: 3 2017-01-14 09:24:38,693 INFO org.apache.flink.yarn.YarnFlinkResourceManager - Launching TaskManager in container ContainerInLaunch @ 1484357078692: Container: [ContainerId: container_1483499303549_0043_01_000002, NodeId: 9-96-101-177:26009, NodeHttpAddress: 9-96-101-177:26010, Resource: <memory:1024, vCores:1>, Priority: 0, Token: Token { kind: ContainerToken, service: 9.96.101.177:26009 }, ] on host 9-96-101-177 2017-01-14 09:24:38,694 INFO org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy - Opening proxy : 9-96-101-177:26009 2017-01-14 09:24:39,189 INFO org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl - Received new token for : 9-96-101-251:26009 2017-01-14 09:24:39,189 INFO org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl - Received new token for : 9-96-101-32:26009 2017-01-14 09:24:39,189 INFO org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl - Received new token for : 9-91-8-160:26009 2017-01-14 09:24:39,190 INFO org.apache.flink.yarn.YarnFlinkResourceManager - Received new container: container_1483499303549_0043_01_000003 - Remaining pending container requests: 2 2017-01-14 09:24:39,190 INFO org.apache.flink.yarn.YarnFlinkResourceManager - Launching TaskManager in container ContainerInLaunch @ 1484357079190: Container: [ContainerId: container_1483499303549_0043_01_000003, NodeId: 9-96-101-251:26009, NodeHttpAddress: 9-96-101-251:26010, Resource: <memory:1024, vCores:1>, Priority: 0, Token: Token { kind: ContainerToken, service: 9.96.101.251:26009 }, ] on host 9-96-101-251 2017-01-14 09:24:39,190 INFO org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy - Opening proxy : 9-96-101-251:26009 2017-01-14 09:24:39,202 INFO org.apache.flink.yarn.YarnFlinkResourceManager - Received new container: container_1483499303549_0043_01_000004 - Remaining pending container requests: 1 2017-01-14 09:24:39,202 INFO org.apache.flink.yarn.YarnFlinkResourceManager - Launching TaskManager in container ContainerInLaunch @ 1484357079202: Container: [ContainerId: container_1483499303549_0043_01_000004, NodeId: 9-96-101-32:26009, NodeHttpAddress: 9-96-101-32:26010, Resource: <memory:1024, vCores:1>, Priority: 0, Token: Token { kind: ContainerToken, service: 9.96.101.32:26009 }, ] on host 9-96-101-32 2017-01-14 09:24:39,202 INFO org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy - Opening proxy : 9-96-101-32:26009 2017-01-14 09:24:39,217 INFO org.apache.flink.yarn.YarnFlinkResourceManager - Received new container: container_1483499303549_0043_01_000005 - Remaining pending container requests: 0 2017-01-14 09:24:39,217 INFO org.apache.flink.yarn.YarnFlinkResourceManager - Launching TaskManager in container ContainerInLaunch @ 1484357079217: Container: [ContainerId: container_1483499303549_0043_01_000005, NodeId: 9-91-8-160:26009, NodeHttpAddress: 9-91-8-160:26010, Resource: <memory:1024, vCores:1>, Priority: 0, Token: Token { kind: ContainerToken, service: 9.91.8.160:26009 }, ] on host 9-91-8-160 2017-01-14 09:24:39,217 INFO org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy - Opening proxy : 9-91-8-160:26009 2017-01-14 09:24:43,348 INFO org.apache.flink.yarn.YarnFlinkResourceManager - TaskManager container_1483499303549_0043_01_000003 has started. 2017-01-14 09:24:43,350 INFO org.apache.flink.runtime.instance.InstanceManager - Registered TaskManager at 9-96-101-251 (akka.tcp://flink@9-96-101-251:57010/user/taskmanager) as 49800ab8cfcd1a11e45084a48281df75. Current number of registered hosts is 1. Current number of alive task slots is 1. 2017-01-14 09:24:44,881 INFO org.apache.flink.yarn.YarnFlinkResourceManager - TaskManager container_1483499303549_0043_01_000002 has started. 2017-01-14 09:24:44,881 INFO org.apache.flink.runtime.instance.InstanceManager - Registered TaskManager at 9-96-101-177 (akka.tcp://flink@9-96-101-177:35778/user/taskmanager) as 5d65baf1ec196cf3ac5bc43870156855. Current number of registered hosts is 2. Current number of alive task slots is 2. 2017-01-14 09:24:45,855 INFO org.apache.flink.yarn.YarnFlinkResourceManager - TaskManager container_1483499303549_0043_01_000004 has started. 2017-01-14 09:24:45,855 INFO org.apache.flink.runtime.instance.InstanceManager - Registered TaskManager at 9-96-101-32 (akka.tcp://flink@9-96-101-32:58486/user/taskmanager) as 80de4a47fa60536b78ea052cbadec7ee. Current number of registered hosts is 3. Current number of alive task slots is 3. 2017-01-14 09:24:46,018 INFO org.apache.flink.yarn.YarnFlinkResourceManager - TaskManager container_1483499303549_0043_01_000005 has started. 2017-01-14 09:24:46,018 INFO org.apache.flink.runtime.instance.InstanceManager - Registered TaskManager at 9-91-8-160 (akka.tcp://flink@9-91-8-160:47548/user/taskmanager) as e9f2497d6223b2d704b3aced665a3c02. Current number of registered hosts is 4. Current number of alive task slots is 4. 2017-01-14 09:29:58,066 ERROR akka.remote.EndpointWriter - dropping message [class akka.actor.ActorSelectionMessage] for non-local recipient [Actor[akka.tcp://flink@9.96.101.251:38785/]] arriving at [akka.tcp://flink@9.96.101.251:38785] inbound addresses are [akka.tcp://flink@9-96-101-251:38785] Thanks! HuangWHWHW 2017/1/14 -----邮件原件----- 发件人: Till Rohrmann [mailto:trohrm...@apache.org] 发送时间: 2017年1月13日 18:22 收件人: dev@flink.apache.org 抄送: u...@flink.apache.org 主题: Re: some questions about submit flink job on flink-yarn Hi Huang, this seems to be very strange, because the JobManager’s actor system has bound to the address 9-96-101-177 instead of 9.96.101.177. It seems a if the . have been replaced by -. Could you maybe tell me which version of Flink you’re running and also share the complete JobManager log with us? I tested it with the latest 1.2 SNAPSHOT version and there it seemed to work. Cheers, Till On Fri, Jan 13, 2017 at 9:02 AM, huangwei (G) <huangwei...@huawei.com> wrote: > Dear All, > > I get an error in jobmanage.log following when I submit a flink job > (batch/WordCount.jar) by using command : "./bin/flink run -m > 9.96.101.177:39180 ./examples/batch/WordCount.jar". > > And the flink is on yarn cluster. > > Error in jobmanage.log : > 2017-01-13 15:28:27,402 ERROR akka.remote.EndpointWriter > - dropping message [class > akka.actor.ActorSelectionMessage] for non-local recipient > [Actor[akka.tcp://flink@9.96.101.177:39180/]] > arriving at [akka.tcp://flink@9.96.101.177:39180] inbound addresses > are [akka.tcp://flink@9-96-101-177:39180] > > However, It is success when I use flink web-ui to submit the job. > > How to solve this problem? > > And otherwise, when I started the flink on yarn, the > jobmanage.rpc.port and the web port both were changed to 39180 and 57724. > The configuration following in flink-conf.yaml is just as default : > > jobmanager.rpc.port: 6123 > > and > > jobmanager.web.port: 8081 > > I started the flink on yarn using command : "./bin/yarn-session.sh -n 4". > > Why were the ports changed to 39180 and 57724? > > Many thanks if there is any help! > > HuangWHWHW > 2017.1.13 >