Hi Till,
The "9-96-101-177" is just the hostname.
I rerun the flink on yarn and here is the jobmanager.log, and sorry for I
blocked some sensitive log. By the way , the port(another questions in my
earlier mail) seems to be a random value(this time is 38785 and 35699).
I used flink-1.2.0, it works well on the yarn which is provided by apache open
source. But I run flink on a special yarn which was token some safety
reinforcement based on apache-yarn. I just have no idea about the ERROR log.
Jobmanage.log:
2017-01-14 09:24:35,584 INFO org.apache.flink.yarn.YarnApplicationMasterRunner
-
--------------------------------------------------------------------------------
2017-01-14 09:24:35,585 INFO org.apache.flink.yarn.YarnApplicationMasterRunner
- Starting YARN ApplicationMaster / ResourceManager / JobManager
(Version: 1.2.0, Rev:82b1079, Date:04.01.2017 @ 17:38:23 CST)
2017-01-14 09:24:35,585 INFO org.apache.flink.yarn.YarnApplicationMasterRunner
- Current user: admin
2017-01-14 09:24:35,585 INFO org.apache.flink.yarn.YarnApplicationMasterRunner
- JVM: Java HotSpot(TM) 64-Bit Server VM - Oracle Corporation -
1.8/25.112-b15
2017-01-14 09:24:35,585 INFO org.apache.flink.yarn.YarnApplicationMasterRunner
- Maximum heap size: 406 MiBytes
2017-01-14 09:24:35,585 INFO org.apache.flink.yarn.YarnApplicationMasterRunner
- JAVA_HOME: /opt/huawei/Bigdata/jdk1.8.0_112/
2017-01-14 09:24:35,587 INFO org.apache.flink.yarn.YarnApplicationMasterRunner
- Hadoop version: 2.7.2
2017-01-14 09:24:35,587 INFO org.apache.flink.yarn.YarnApplicationMasterRunner
- JVM Options:
2017-01-14 09:24:35,587 INFO org.apache.flink.yarn.YarnApplicationMasterRunner
- -Xmx424M
2017-01-14 09:24:35,587 INFO org.apache.flink.yarn.YarnApplicationMasterRunner
-
-Dlog.file=/srv/BigData/hadoop/data1/nm/containerlogs/application_1483499303549_0043/container_1483499303549_0043_01_000001/jobmanager.log
2017-01-14 09:24:35,587 INFO org.apache.flink.yarn.YarnApplicationMasterRunner
- -Dlogback.configurationFile=file:logback.xml
2017-01-14 09:24:35,587 INFO org.apache.flink.yarn.YarnApplicationMasterRunner
- -Dlog4j.configuration=file:log4j.properties
2017-01-14 09:24:35,587 INFO org.apache.flink.yarn.YarnApplicationMasterRunner
- Program Arguments: (none)
2017-01-14 09:24:35,589 INFO org.apache.flink.yarn.YarnApplicationMasterRunner
-
--------------------------------------------------------------------------------
2017-01-14 09:24:35,589 INFO org.apache.flink.yarn.YarnApplicationMasterRunner
- Registered UNIX signal handlers for [TERM, HUP, INT]
2017-01-14 09:24:35,591 INFO org.apache.flink.yarn.YarnApplicationMasterRunner
- remoteKeytabPrincipal obtained admin
2017-01-14 09:24:35,592 INFO org.apache.flink.yarn.YarnApplicationMasterRunner
- YARN daemon is running as: admin Yarn client user obtainer:
[email protected]
2017-01-14 09:24:35,596 INFO org.apache.flink.yarn.YarnApplicationMasterRunner
- Loading config from directory
/srv/BigData/hadoop/data1/nm/localdir/usercache/admin/appcache/application_1483499303549_0043/container_1483499303549_0043_01_000001
2017-01-14 09:24:35,598 INFO
org.apache.flink.configuration.GlobalConfiguration - Loading
configuration property: jobmanager.rpc.address, 9.96.101.32
2017-01-14 09:24:35,598 INFO
org.apache.flink.configuration.GlobalConfiguration - Loading
configuration property: jobmanager.rpc.port, 6123
2017-01-14 09:24:35,598 INFO
org.apache.flink.configuration.GlobalConfiguration - Loading
configuration property: jobmanager.heap.mb, 256
2017-01-14 09:24:35,598 INFO
org.apache.flink.configuration.GlobalConfiguration - Loading
configuration property: taskmanager.heap.mb, 512
2017-01-14 09:24:35,598 INFO
org.apache.flink.configuration.GlobalConfiguration - Loading
configuration property: taskmanager.numberOfTaskSlots, 1
2017-01-14 09:24:35,598 INFO
org.apache.flink.configuration.GlobalConfiguration - Loading
configuration property: taskmanager.memory.preallocate, false
2017-01-14 09:24:35,599 INFO
org.apache.flink.configuration.GlobalConfiguration - Loading
configuration property: parallelism.default, 1
2017-01-14 09:24:35,599 INFO
org.apache.flink.configuration.GlobalConfiguration - Loading
configuration property: jobmanager.web.port, 8081
2017-01-14 09:24:35,599 INFO
org.apache.flink.configuration.GlobalConfiguration - Loading
configuration property: security.keytab,
/home/demo/flink/release/flink-1.2.0/keytab/user.keytab
2017-01-14 09:24:35,599 INFO
org.apache.flink.configuration.GlobalConfiguration - Loading
configuration property: security.principal, admin
2017-01-14 09:24:35,608 INFO
org.apache.flink.runtime.security.JaasConfiguration - Initializing
JAAS configuration instance. Parameters:
/srv/BigData/hadoop/data1/nm/localdir/usercache/admin/appcache/application_1483499303549_0043/container_1483499303549_0043_01_000001/krb5.keytab,
admin
2017-01-14 09:24:35,609 INFO org.apache.flink.runtime.security.SecurityUtils
- SASL client auth for ZK will be disabled
2017-01-14 09:24:35,824 INFO org.apache.hadoop.security.UserGroupInformation
- Login successful for user admin using keytab file
/srv/BigData/hadoop/data1/nm/localdir/usercache/admin/appcache/application_1483499303549_0043/container_1483499303549_0043_01_000001/krb5.keytab
2017-01-14 09:24:35,825 INFO org.apache.flink.runtime.security.SecurityUtils
- Hadoop user set to [email protected] (auth:KERBEROS)
2017-01-14 09:24:35,936 INFO org.apache.flink.yarn.YarnApplicationMasterRunner
- YARN assigned hostname for application master: 9-96-101-251
2017-01-14 09:24:35,936 INFO org.apache.flink.yarn.YarnApplicationMasterRunner
- keytabPath:
/srv/BigData/hadoop/data1/nm/localdir/usercache/admin/appcache/application_1483499303549_0043/container_1483499303549_0043_01_000001/krb5.keytab
2017-01-14 09:24:35,938 INFO org.apache.flink.yarn.YarnApplicationMasterRunner
- TaskManagers will be created with 1 task slots
2017-01-14 09:24:35,938 INFO org.apache.flink.yarn.YarnApplicationMasterRunner
- TaskManagers will be started with container size 1024 MB, JVM
heap size 424 MB, JVM direct memory limit 424 MB
2017-01-14 09:24:35,943 INFO org.apache.flink.yarn.YarnApplicationMasterRunner
- Trying to start actor system at 9.96.101.251:38785
2017-01-14 09:24:36,336 INFO akka.event.slf4j.Slf4jLogger
- Slf4jLogger started
2017-01-14 09:24:36,438 INFO Remoting
- Starting remoting
2017-01-14 09:24:36,547 INFO Remoting
- Remoting started; listening on addresses
:[akka.tcp://flink@9-96-101-251:38785]
2017-01-14 09:24:36,551 INFO org.apache.flink.yarn.YarnApplicationMasterRunner
- Actor system started at 9.96.101.251:38785
2017-01-14 09:24:36,551 INFO org.apache.flink.yarn.YarnApplicationMasterRunner
- Actor system bound to hostname 9-96-101-251.
2017-01-14 09:24:36,554 INFO org.apache.flink.yarn.YarnApplicationMasterRunner
- Setting up resources for TaskManagers
2017-01-14 09:24:36,554 INFO org.apache.flink.yarn.YarnApplicationMasterRunner
- TM:remoteKeytabPath obtained
hdfs://hacluster/user/admin/.flink/application_1483499303549_0043/user.keytab
2017-01-14 09:24:36,555 INFO org.apache.flink.yarn.YarnApplicationMasterRunner
- TM:remoteKeytabPrincipal obtained admin
2017-01-14 09:24:36,555 INFO org.apache.flink.yarn.YarnApplicationMasterRunner
- TM:remoteYarnConfPath obtained null
2017-01-14 09:24:36,555 INFO org.apache.flink.yarn.YarnApplicationMasterRunner
- TM:remotekrb5Path obtained null
2017-01-14 09:24:36,932 WARN org.apache.hadoop.util.NativeCodeLoader
- Unable to load native-hadoop library for your platform... using
builtin-java classes where applicable
2017-01-14 09:24:36,945 WARN
org.apache.hadoop.hdfs.shortcircuit.DomainSocketFactory - The
short-circuit local reads feature cannot be used because libhadoop cannot be
loaded.
2017-01-14 09:24:36,949 INFO org.apache.flink.yarn.YarnApplicationMasterRunner
- Adding keytab
hdfs://hacluster/user/admin/.flink/application_1483499303549_0043/user.keytab
to the AM container local resource bucket
2017-01-14 09:24:37,085 INFO org.apache.flink.yarn.Utils
- Copying from
file:/srv/BigData/hadoop/data1/nm/localdir/usercache/admin/appcache/application_1483499303549_0043/container_1483499303549_0043_01_000001/e2943789-d80a-4abd-8ae9-2fc14cb1fd03-taskmanager-conf.yaml
to
hdfs://hacluster/user/admin/.flink/application_1483499303549_0043/e2943789-d80a-4abd-8ae9-2fc14cb1fd03-taskmanager-conf.yaml
2017-01-14 09:24:37,258 INFO org.apache.flink.yarn.YarnApplicationMasterRunner
- Prepared local resource for modified yaml: resource { scheme:
"hdfs" host: "hacluster" port: -1 file:
"/user/admin/.flink/application_1483499303549_0043/e2943789-d80a-4abd-8ae9-2fc14cb1fd03-taskmanager-conf.yaml"
} size: 878 timestamp: 1484357077250 type: FILE visibility: APPLICATION
2017-01-14 09:24:37,265 INFO org.apache.flink.yarn.YarnApplicationMasterRunner
- Creating container launch context for TaskManagers
2017-01-14 09:24:37,265 INFO org.apache.flink.yarn.YarnApplicationMasterRunner
- Starting TaskManagers with command: $JAVA_HOME/bin/java -Xms424m
-Xmx424m -XX:MaxDirectMemorySize=424m -Dlog.file=<LOG_DIR>/taskmanager.log
-Dlogback.configurationFile=file:./logback.xml
-Dlog4j.configuration=file:./log4j.properties
org.apache.flink.yarn.YarnTaskManager --configDir . 1>
<LOG_DIR>/taskmanager.out 2> <LOG_DIR>/taskmanager.err
2017-01-14 09:24:37,288 INFO org.apache.flink.runtime.blob.BlobServer
- Created BLOB server storage directory
/tmp/blobStore-f91bad88-1473-4e86-b151-f93dffa58baa
2017-01-14 09:24:37,289 INFO org.apache.flink.runtime.blob.BlobServer
- Started BLOB server at 0.0.0.0:53972 - max concurrent requests:
50 - max backlog: 1000
2017-01-14 09:24:37,302 INFO org.apache.flink.runtime.metrics.MetricRegistry
- No metrics reporter configured, no metrics will be
exposed/reported.
2017-01-14 09:24:37,307 INFO org.apache.flink.yarn.YarnApplicationMasterRunner
- Starting JobManager Web Frontend
2017-01-14 09:24:37,310 INFO
org.apache.flink.runtime.jobmanager.MemoryArchivist - Started memory
archivist akka://flink/user/$a
2017-01-14 09:24:37,311 INFO org.apache.flink.yarn.YarnJobManager
- Starting JobManager at
akka.tcp://flink@9-96-101-251:38785/user/jobmanager.
2017-01-14 09:24:37,318 INFO
org.apache.flink.runtime.webmonitor.WebMonitorUtils - Determined
location of JobManager log file:
/srv/BigData/hadoop/data1/nm/containerlogs/application_1483499303549_0043/container_1483499303549_0043_01_000001/jobmanager.log
2017-01-14 09:24:37,318 INFO
org.apache.flink.runtime.webmonitor.WebMonitorUtils - Determined
location of JobManager stdout file:
/srv/BigData/hadoop/data1/nm/containerlogs/application_1483499303549_0043/container_1483499303549_0043_01_000001/jobmanager.out
2017-01-14 09:24:37,318 INFO
org.apache.flink.runtime.webmonitor.WebRuntimeMonitor - Using directory
/tmp/flink-web-c4991b46-e637-4207-80ed-caef4cf5702e for the web interface files
2017-01-14 09:24:37,364 INFO
org.apache.flink.runtime.webmonitor.WebRuntimeMonitor - Using directory
/tmp/flink-web-f8b7b6d7-b066-48a1-9536-40a3cdf42778 for web frontend JAR file
uploads
2017-01-14 09:24:37,378 INFO org.apache.flink.yarn.YarnJobManager
- JobManager akka.tcp://flink@9-96-101-251:38785/user/jobmanager
was granted leadership with leader session ID None.
2017-01-14 09:24:37,560 INFO
org.apache.flink.runtime.webmonitor.WebRuntimeMonitor - Web frontend
listening at 0:0:0:0:0:0:0:0:35699
2017-01-14 09:24:37,561 INFO
org.apache.flink.runtime.webmonitor.WebRuntimeMonitor - Starting with
JobManager akka.tcp://flink@9-96-101-251:38785/user/jobmanager on port 35699
2017-01-14 09:24:37,561 INFO
org.apache.flink.runtime.webmonitor.JobManagerRetriever - New leader
reachable under akka://flink/user/jobmanager#-640052308:null.
2017-01-14 09:24:37,568 INFO org.apache.flink.yarn.YarnApplicationMasterRunner
- YARN application tolerates 4 failed TaskManager containers before
giving up
2017-01-14 09:24:37,571 INFO org.apache.flink.yarn.YarnApplicationMasterRunner
- YARN Application Master started
2017-01-14 09:24:37,579 INFO org.apache.flink.yarn.YarnFlinkResourceManager
- Initializing YARN resource master
2017-01-14 09:24:37,605 INFO
org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy -
yarn.client.max-cached-nodemanagers-proxies : 0
2017-01-14 09:24:37,606 INFO org.apache.flink.yarn.YarnFlinkResourceManager
- Registering Application Master with tracking url
http://9-96-101-251:35699
2017-01-14 09:24:37,641 INFO org.apache.flink.yarn.YarnFlinkResourceManager
- Trying to associate with JobManager leader
akka://flink/user/jobmanager#-640052308
2017-01-14 09:24:37,647 INFO org.apache.flink.yarn.YarnFlinkResourceManager
- Resource Manager associating with leading JobManager
Actor[akka://flink/user/jobmanager#-640052308] - leader session null
2017-01-14 09:24:37,648 INFO org.apache.flink.yarn.YarnFlinkResourceManager
- Requesting new TaskManager container with 1024 megabytes memory.
Pending requests: 1
2017-01-14 09:24:37,654 INFO org.apache.flink.yarn.YarnFlinkResourceManager
- Requesting new TaskManager container with 1024 megabytes memory.
Pending requests: 2
2017-01-14 09:24:37,654 INFO org.apache.flink.yarn.YarnFlinkResourceManager
- Requesting new TaskManager container with 1024 megabytes memory.
Pending requests: 3
2017-01-14 09:24:37,655 INFO org.apache.flink.yarn.YarnFlinkResourceManager
- Requesting new TaskManager container with 1024 megabytes memory.
Pending requests: 4
2017-01-14 09:24:38,681 INFO
org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl - Received new
token for : 9-96-101-177:26009
2017-01-14 09:24:38,692 INFO org.apache.flink.yarn.YarnFlinkResourceManager
- Received new container: container_1483499303549_0043_01_000002 -
Remaining pending container requests: 3
2017-01-14 09:24:38,693 INFO org.apache.flink.yarn.YarnFlinkResourceManager
- Launching TaskManager in container ContainerInLaunch @
1484357078692: Container: [ContainerId: container_1483499303549_0043_01_000002,
NodeId: 9-96-101-177:26009, NodeHttpAddress: 9-96-101-177:26010, Resource:
<memory:1024, vCores:1>, Priority: 0, Token: Token { kind: ContainerToken,
service: 9.96.101.177:26009 }, ] on host 9-96-101-177
2017-01-14 09:24:38,694 INFO
org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy -
Opening proxy : 9-96-101-177:26009
2017-01-14 09:24:39,189 INFO
org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl - Received new
token for : 9-96-101-251:26009
2017-01-14 09:24:39,189 INFO
org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl - Received new
token for : 9-96-101-32:26009
2017-01-14 09:24:39,189 INFO
org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl - Received new
token for : 9-91-8-160:26009
2017-01-14 09:24:39,190 INFO org.apache.flink.yarn.YarnFlinkResourceManager
- Received new container: container_1483499303549_0043_01_000003 -
Remaining pending container requests: 2
2017-01-14 09:24:39,190 INFO org.apache.flink.yarn.YarnFlinkResourceManager
- Launching TaskManager in container ContainerInLaunch @
1484357079190: Container: [ContainerId: container_1483499303549_0043_01_000003,
NodeId: 9-96-101-251:26009, NodeHttpAddress: 9-96-101-251:26010, Resource:
<memory:1024, vCores:1>, Priority: 0, Token: Token { kind: ContainerToken,
service: 9.96.101.251:26009 }, ] on host 9-96-101-251
2017-01-14 09:24:39,190 INFO
org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy -
Opening proxy : 9-96-101-251:26009
2017-01-14 09:24:39,202 INFO org.apache.flink.yarn.YarnFlinkResourceManager
- Received new container: container_1483499303549_0043_01_000004 -
Remaining pending container requests: 1
2017-01-14 09:24:39,202 INFO org.apache.flink.yarn.YarnFlinkResourceManager
- Launching TaskManager in container ContainerInLaunch @
1484357079202: Container: [ContainerId: container_1483499303549_0043_01_000004,
NodeId: 9-96-101-32:26009, NodeHttpAddress: 9-96-101-32:26010, Resource:
<memory:1024, vCores:1>, Priority: 0, Token: Token { kind: ContainerToken,
service: 9.96.101.32:26009 }, ] on host 9-96-101-32
2017-01-14 09:24:39,202 INFO
org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy -
Opening proxy : 9-96-101-32:26009
2017-01-14 09:24:39,217 INFO org.apache.flink.yarn.YarnFlinkResourceManager
- Received new container: container_1483499303549_0043_01_000005 -
Remaining pending container requests: 0
2017-01-14 09:24:39,217 INFO org.apache.flink.yarn.YarnFlinkResourceManager
- Launching TaskManager in container ContainerInLaunch @
1484357079217: Container: [ContainerId: container_1483499303549_0043_01_000005,
NodeId: 9-91-8-160:26009, NodeHttpAddress: 9-91-8-160:26010, Resource:
<memory:1024, vCores:1>, Priority: 0, Token: Token { kind: ContainerToken,
service: 9.91.8.160:26009 }, ] on host 9-91-8-160
2017-01-14 09:24:39,217 INFO
org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy -
Opening proxy : 9-91-8-160:26009
2017-01-14 09:24:43,348 INFO org.apache.flink.yarn.YarnFlinkResourceManager
- TaskManager container_1483499303549_0043_01_000003 has started.
2017-01-14 09:24:43,350 INFO org.apache.flink.runtime.instance.InstanceManager
- Registered TaskManager at 9-96-101-251
(akka.tcp://flink@9-96-101-251:57010/user/taskmanager) as
49800ab8cfcd1a11e45084a48281df75. Current number of registered hosts is 1.
Current number of alive task slots is 1.
2017-01-14 09:24:44,881 INFO org.apache.flink.yarn.YarnFlinkResourceManager
- TaskManager container_1483499303549_0043_01_000002 has started.
2017-01-14 09:24:44,881 INFO org.apache.flink.runtime.instance.InstanceManager
- Registered TaskManager at 9-96-101-177
(akka.tcp://flink@9-96-101-177:35778/user/taskmanager) as
5d65baf1ec196cf3ac5bc43870156855. Current number of registered hosts is 2.
Current number of alive task slots is 2.
2017-01-14 09:24:45,855 INFO org.apache.flink.yarn.YarnFlinkResourceManager
- TaskManager container_1483499303549_0043_01_000004 has started.
2017-01-14 09:24:45,855 INFO org.apache.flink.runtime.instance.InstanceManager
- Registered TaskManager at 9-96-101-32
(akka.tcp://flink@9-96-101-32:58486/user/taskmanager) as
80de4a47fa60536b78ea052cbadec7ee. Current number of registered hosts is 3.
Current number of alive task slots is 3.
2017-01-14 09:24:46,018 INFO org.apache.flink.yarn.YarnFlinkResourceManager
- TaskManager container_1483499303549_0043_01_000005 has started.
2017-01-14 09:24:46,018 INFO org.apache.flink.runtime.instance.InstanceManager
- Registered TaskManager at 9-91-8-160
(akka.tcp://flink@9-91-8-160:47548/user/taskmanager) as
e9f2497d6223b2d704b3aced665a3c02. Current number of registered hosts is 4.
Current number of alive task slots is 4.
2017-01-14 09:29:58,066 ERROR akka.remote.EndpointWriter
- dropping message [class akka.actor.ActorSelectionMessage] for
non-local recipient [Actor[akka.tcp://[email protected]:38785/]] arriving at
[akka.tcp://[email protected]:38785] inbound addresses are
[akka.tcp://flink@9-96-101-251:38785]
Thanks!
HuangWHWHW
2017/1/14
-----邮件原件-----
发件人: Till Rohrmann [mailto:[email protected]]
发送时间: 2017年1月13日 18:22
收件人: [email protected]
抄送: [email protected]
主题: Re: some questions about submit flink job on flink-yarn
Hi Huang,
this seems to be very strange, because the JobManager’s actor system has bound
to the address 9-96-101-177 instead of 9.96.101.177. It seems a if the . have
been replaced by -.
Could you maybe tell me which version of Flink you’re running and also share
the complete JobManager log with us?
I tested it with the latest 1.2 SNAPSHOT version and there it seemed to work.
Cheers,
Till
On Fri, Jan 13, 2017 at 9:02 AM, huangwei (G) <[email protected]>
wrote:
> Dear All,
>
> I get an error in jobmanage.log following when I submit a flink job
> (batch/WordCount.jar) by using command : "./bin/flink run -m
> 9.96.101.177:39180 ./examples/batch/WordCount.jar".
>
> And the flink is on yarn cluster.
>
> Error in jobmanage.log :
> 2017-01-13 15:28:27,402 ERROR akka.remote.EndpointWriter
> - dropping message [class
> akka.actor.ActorSelectionMessage] for non-local recipient
> [Actor[akka.tcp://[email protected]:39180/]]
> arriving at [akka.tcp://[email protected]:39180] inbound addresses
> are [akka.tcp://flink@9-96-101-177:39180]
>
> However, It is success when I use flink web-ui to submit the job.
>
> How to solve this problem?
>
> And otherwise, when I started the flink on yarn, the
> jobmanage.rpc.port and the web port both were changed to 39180 and 57724.
> The configuration following in flink-conf.yaml is just as default :
>
> jobmanager.rpc.port: 6123
>
> and
>
> jobmanager.web.port: 8081
>
> I started the flink on yarn using command : "./bin/yarn-session.sh -n 4".
>
> Why were the ports changed to 39180 and 57724?
>
> Many thanks if there is any help!
>
> HuangWHWHW
> 2017.1.13
>