Hi All,

I'm trying to setup Flink 1.0.0 cluster on Docker (separate containers for
jobmanager and taskmanager) inside AWS (Using AWS ECS service). I tested it
locally and its working fine but on AWS Docker, I am running into following
issue

*2016-03-09 18:04:12,114 PST [INFO]  ec2-52-3-248-202.compute-1.ama [main]
o.a.f.runtime.jobmanager.JobManager - Starting JobManager with
high-availability*
*2016-03-09 18:04:12,118 PST [INFO]  ec2-52-3-248-202.compute-1.ama [main]
o.a.f.runtime.jobmanager.JobManager - Starting JobManager on
172.31.63.152:8079 <http://172.31.63.152:8079> with execution mode CLUSTER*
*2016-03-09 18:04:12,172 PST [INFO]  ec2-52-3-248-202.compute-1.ama [main]
o.a.f.runtime.jobmanager.JobManager - Security is not enabled. Starting
non-authenticated JobManager.*
*2016-03-09 18:04:12,174 PST [DEBUG] ec2-52-3-248-202.compute-1.ama [main]
org.apache.flink.util.NetUtils - Trying to open socket on port 8079*
*2016-03-09 18:04:12,176 PST [DEBUG] ec2-52-3-248-202.compute-1.ama [main]
org.apache.flink.util.NetUtils - Unable to allocate socket on port*
*java.net.BindException: Cannot assign requested address*
*    at java.net.PlainSocketImpl.socketBind(Native Method)*
*    at
java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)*
*    at java.net.ServerSocket.bind(ServerSocket.java:375)*
*    at java.net.ServerSocket.<init>(ServerSocket.java:237)*
*    at
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$2$$anon$3.createSocket(JobManager.scala:1722)*
*    at
org.apache.flink.util.NetUtils.createSocketFromPorts(NetUtils.java:237)*
*    at
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$2.apply$mcV$sp(JobManager.scala:1719)*
*    at
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$2.apply(JobManager.scala:1717)*
*    at
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$2.apply(JobManager.scala:1717)*
*    at scala.util.Try$.apply(Try.scala:192)*
*    at
org.apache.flink.runtime.jobmanager.JobManager$.retryOnBindException(JobManager.scala:1772)*
*    at
org.apache.flink.runtime.jobmanager.JobManager$.runJobManager(JobManager.scala:1717)*
*    at
org.apache.flink.runtime.jobmanager.JobManager$.main(JobManager.scala:1653)*
*    at
org.apache.flink.runtime.jobmanager.JobManager.main(JobManager.scala)*
*2016-03-09 18:04:12,180 PST [ERROR] ec2-52-3-248-202.compute-1.ama [main]
o.a.f.runtime.jobmanager.JobManager - Failed to run JobManager.*
*java.lang.RuntimeException: Unable to do further retries starting the
actor system*
*    at
org.apache.flink.runtime.jobmanager.JobManager$.retryOnBindException(JobManager.scala:1777)*
*    at
org.apache.flink.runtime.jobmanager.JobManager$.runJobManager(JobManager.scala:1717)*
*    at
org.apache.flink.runtime.jobmanager.JobManager$.main(JobManager.scala:1653)*
*    at
org.apache.flink.runtime.jobmanager.JobManager.main(JobManager.scala)*
*2016-03-09 18:04:12,991 PST [DEBUG] ec2-52-3-248-202.compute-1.ama [main]
o.a.h.m.lib.MutableMetricsFactory - field
org.apache.hadoop.metrics2.lib.MutableRate
org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess
with annotation @org.apache.hadoop.metrics2.annotation.Metric(about=,
sampleName=Ops, always=false, type=DEFAULT, value=[Rate of successful
kerberos logins and latency (milliseconds)], valueName=Time)*


Initially Jobmanager tries to bind to port 0 which did not work. On looking
further into it, I tried using recovery jobmanager port using different
port combinations, but it does not seems to be working... I've exposed the
ports in the docker compose file as well....


PFA the jobmanager log file for details also the jobmanager config file...
-- 
Thanks,
Deepak Jha
2016-03-09 18:04:11,887 PST [INFO]  ec2-52-3-248-202.compute-1.ama [main] 
o.a.f.runtime.jobmanager.JobManager - 
--------------------------------------------------------------------------------
2016-03-09 18:04:11,888 PST [INFO]  ec2-52-3-248-202.compute-1.ama [main] 
o.a.f.runtime.jobmanager.JobManager - Registered UNIX signal handlers for 
[TERM, HUP, INT]
2016-03-09 18:04:12,070 PST [INFO]  ec2-52-3-248-202.compute-1.ama [main] 
o.a.f.runtime.jobmanager.JobManager - Loading configuration from 
/opt/flink-1.0.0/conf
2016-03-09 18:04:12,082 PST [WARN]  ec2-52-3-248-202.compute-1.ama [main] 
o.a.f.c.GlobalConfiguration - Error while reading configuration: Cannot read 
property 0
2016-03-09 18:04:12,083 PST [WARN]  ec2-52-3-248-202.compute-1.ama [main] 
o.a.f.c.GlobalConfiguration - Error while reading configuration: Cannot read 
property 1
2016-03-09 18:04:12,083 PST [WARN]  ec2-52-3-248-202.compute-1.ama [main] 
o.a.f.c.GlobalConfiguration - Error while reading configuration: Cannot read 
property 2
2016-03-09 18:04:12,091 PST [DEBUG] ec2-52-3-248-202.compute-1.ama [main] 
o.a.f.c.GlobalConfiguration - Loading configuration property: 
recovery.jobmanager.port, 8079
2016-03-09 18:04:12,095 PST [DEBUG] ec2-52-3-248-202.compute-1.ama [main] 
o.a.f.c.GlobalConfiguration - Loading configuration property: 
jobmanager.rpc.address, ec2-52-3-248-202.compute-1.amazonaws.com
2016-03-09 18:04:12,095 PST [DEBUG] ec2-52-3-248-202.compute-1.ama [main] 
o.a.f.c.GlobalConfiguration - Loading configuration property: 
jobmanager.rpc.port, 6123
2016-03-09 18:04:12,095 PST [DEBUG] ec2-52-3-248-202.compute-1.ama [main] 
o.a.f.c.GlobalConfiguration - Loading configuration property: 
jobmanager.heap.mb, 512
2016-03-09 18:04:12,096 PST [DEBUG] ec2-52-3-248-202.compute-1.ama [main] 
o.a.f.c.GlobalConfiguration - Loading configuration property: blob.server.port, 
50100-50200
2016-03-09 18:04:12,096 PST [DEBUG] ec2-52-3-248-202.compute-1.ama [main] 
o.a.f.c.GlobalConfiguration - Loading configuration property: 
jobmanager.web.port, 8080
2016-03-09 18:04:12,097 PST [DEBUG] ec2-52-3-248-202.compute-1.ama [main] 
o.a.f.c.GlobalConfiguration - Loading configuration property: state.backend, 
filesystem
2016-03-09 18:04:12,097 PST [DEBUG] ec2-52-3-248-202.compute-1.ama [main] 
o.a.f.c.GlobalConfiguration - Loading configuration property: 
state.backend.fs.checkpointdir, s3://flink-dev/checkpoints
2016-03-09 18:04:12,104 PST [DEBUG] ec2-52-3-248-202.compute-1.ama [main] 
o.a.f.c.GlobalConfiguration - Loading configuration property: 
fs.hdfs.hadoopconf, /opt/flink/conf
2016-03-09 18:04:12,105 PST [DEBUG] ec2-52-3-248-202.compute-1.ama [main] 
o.a.f.c.GlobalConfiguration - Loading configuration property: 
fs.overwrite-files, true
2016-03-09 18:04:12,105 PST [DEBUG] ec2-52-3-248-202.compute-1.ama [main] 
o.a.f.c.GlobalConfiguration - Loading configuration property: 
fs.output.always-create-directory, true
2016-03-09 18:04:12,105 PST [DEBUG] ec2-52-3-248-202.compute-1.ama [main] 
o.a.f.c.GlobalConfiguration - Loading configuration property: recovery.mode, 
zookeeper
2016-03-09 18:04:12,105 PST [DEBUG] ec2-52-3-248-202.compute-1.ama [main] 
o.a.f.c.GlobalConfiguration - Loading configuration property: 
recovery.zookeeper.quorum, 
52.87.232.166:2181,54.88.145.121:2181,52.3.253.96:2181
2016-03-09 18:04:12,106 PST [DEBUG] ec2-52-3-248-202.compute-1.ama [main] 
o.a.f.c.GlobalConfiguration - Loading configuration property: 
recovery.zookeeper.path.root, /flink-dev
2016-03-09 18:04:12,106 PST [DEBUG] ec2-52-3-248-202.compute-1.ama [main] 
o.a.f.c.GlobalConfiguration - Loading configuration property: 
recovery.zookeeper.storageDir, s3://flink-dev/zk_recovery
2016-03-09 18:04:12,106 PST [DEBUG] ec2-52-3-248-202.compute-1.ama [main] 
o.a.f.c.GlobalConfiguration - Loading configuration property: 
savepoints.state.backend, filesystem
2016-03-09 18:04:12,107 PST [DEBUG] ec2-52-3-248-202.compute-1.ama [main] 
o.a.f.c.GlobalConfiguration - Loading configuration property: 
savepoints.state.backend.fs.dir, s3://flink-dev/savepoints
2016-03-09 18:04:12,114 PST [INFO]  ec2-52-3-248-202.compute-1.ama [main] 
o.a.f.runtime.jobmanager.JobManager - Starting JobManager with high-availability
2016-03-09 18:04:12,118 PST [INFO]  ec2-52-3-248-202.compute-1.ama [main] 
o.a.f.runtime.jobmanager.JobManager - Starting JobManager on 172.31.63.152:8079 
with execution mode CLUSTER
2016-03-09 18:04:12,172 PST [INFO]  ec2-52-3-248-202.compute-1.ama [main] 
o.a.f.runtime.jobmanager.JobManager - Security is not enabled. Starting 
non-authenticated JobManager.
2016-03-09 18:04:12,174 PST [DEBUG] ec2-52-3-248-202.compute-1.ama [main] 
org.apache.flink.util.NetUtils - Trying to open socket on port 8079
2016-03-09 18:04:12,176 PST [DEBUG] ec2-52-3-248-202.compute-1.ama [main] 
org.apache.flink.util.NetUtils - Unable to allocate socket on port
java.net.BindException: Cannot assign requested address
    at java.net.PlainSocketImpl.socketBind(Native Method)
    at java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)
    at java.net.ServerSocket.bind(ServerSocket.java:375)
    at java.net.ServerSocket.<init>(ServerSocket.java:237)
    at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$2$$anon$3.createSocket(JobManager.scala:1722)
    at org.apache.flink.util.NetUtils.createSocketFromPorts(NetUtils.java:237)
    at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$2.apply$mcV$sp(JobManager.scala:1719)
    at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$2.apply(JobManager.scala:1717)
    at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$2.apply(JobManager.scala:1717)
    at scala.util.Try$.apply(Try.scala:192)
    at 
org.apache.flink.runtime.jobmanager.JobManager$.retryOnBindException(JobManager.scala:1772)
    at 
org.apache.flink.runtime.jobmanager.JobManager$.runJobManager(JobManager.scala:1717)
    at 
org.apache.flink.runtime.jobmanager.JobManager$.main(JobManager.scala:1653)
    at org.apache.flink.runtime.jobmanager.JobManager.main(JobManager.scala)
2016-03-09 18:04:12,180 PST [ERROR] ec2-52-3-248-202.compute-1.ama [main] 
o.a.f.runtime.jobmanager.JobManager - Failed to run JobManager.
java.lang.RuntimeException: Unable to do further retries starting the actor 
system
    at 
org.apache.flink.runtime.jobmanager.JobManager$.retryOnBindException(JobManager.scala:1777)
    at 
org.apache.flink.runtime.jobmanager.JobManager$.runJobManager(JobManager.scala:1717)
    at 
org.apache.flink.runtime.jobmanager.JobManager$.main(JobManager.scala:1653)
    at org.apache.flink.runtime.jobmanager.JobManager.main(JobManager.scala)
2016-03-09 18:04:12,991 PST [DEBUG] ec2-52-3-248-202.compute-1.ama [main] 
o.a.h.m.lib.MutableMetricsFactory - field 
org.apache.hadoop.metrics2.lib.MutableRate 
org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with 
annotation @org.apache.hadoop.metrics2.annotation.Metric(about=, 
sampleName=Ops, always=false, type=DEFAULT, value=[Rate of successful kerberos 
logins and latency (milliseconds)], valueName=Time)
2016-03-09 18:04:13,006 PST [DEBUG] ec2-52-3-248-202.compute-1.ama [main] 
o.a.h.m.lib.MutableMetricsFactory - field 
org.apache.hadoop.metrics2.lib.MutableRate 
org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with 
annotation @org.apache.hadoop.metrics2.annotation.Metric(about=, 
sampleName=Ops, always=false, type=DEFAULT, value=[Rate of failed kerberos 
logins and latency (milliseconds)], valueName=Time)
2016-03-09 18:04:13,007 PST [DEBUG] ec2-52-3-248-202.compute-1.ama [main] 
o.a.h.m.lib.MutableMetricsFactory - field 
org.apache.hadoop.metrics2.lib.MutableRate 
org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with 
annotation @org.apache.hadoop.metrics2.annotation.Metric(about=, 
sampleName=Ops, always=false, type=DEFAULT, value=[GetGroups], valueName=Time)
2016-03-09 18:04:13,008 PST [DEBUG] ec2-52-3-248-202.compute-1.ama [main] 
o.a.h.m.impl.MetricsSystemImpl - UgiMetrics, User and group related metrics
2016-03-09 18:04:13,217 PST [DEBUG] ec2-52-3-248-202.compute-1.ama [main] 
org.apache.hadoop.util.Shell - Failed to detect a valid hadoop home directory
java.io.IOException: HADOOP_HOME or hadoop.home.dir are not set.
    at org.apache.hadoop.util.Shell.checkHadoopHome(Shell.java:303)
    at org.apache.hadoop.util.Shell.<clinit>(Shell.java:328)
    at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:80)
    at 
org.apache.hadoop.security.SecurityUtil.getAuthenticationMethod(SecurityUtil.java:611)
    at 
org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:272)
    at 
org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:260)
    at 
org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:790)
    at 
org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:760)
    at 
org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:633)
    at 
org.apache.flink.runtime.util.EnvironmentInformation.getUserRunning(EnvironmentInformation.java:90)
    at 
org.apache.flink.runtime.util.EnvironmentInformation.logEnvironmentInfo(EnvironmentInformation.java:284)
    at 
org.apache.flink.runtime.jobmanager.JobManager$.main(JobManager.scala:1595)
    at org.apache.flink.runtime.jobmanager.JobManager.main(JobManager.scala)
2016-03-09 18:04:13,319 PST [DEBUG] ec2-52-3-248-202.compute-1.ama [main] 
org.apache.hadoop.util.Shell - setsid exited with exit code 0
2016-03-09 18:04:13,325 PST [DEBUG] ec2-52-3-248-202.compute-1.ama [main] 
o.a.h.s.a.util.KerberosName - Kerberos krb5 configuration not found, setting 
default realm to empty
2016-03-09 18:04:13,328 PST [DEBUG] ec2-52-3-248-202.compute-1.ama [main] 
org.apache.hadoop.security.Groups -  Creating new Groups object
2016-03-09 18:04:13,329 PST [DEBUG] ec2-52-3-248-202.compute-1.ama [main] 
o.a.hadoop.util.NativeCodeLoader - Trying to load the custom-built 
native-hadoop library...
2016-03-09 18:04:13,330 PST [DEBUG] ec2-52-3-248-202.compute-1.ama [main] 
o.a.hadoop.util.NativeCodeLoader - Failed to load native-hadoop with error: 
java.lang.UnsatisfiedLinkError: no hadoop in java.library.path
2016-03-09 18:04:13,330 PST [DEBUG] ec2-52-3-248-202.compute-1.ama [main] 
o.a.hadoop.util.NativeCodeLoader - 
java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
2016-03-09 18:04:13,330 PST [WARN]  ec2-52-3-248-202.compute-1.ama [main] 
o.a.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for 
your platform... using builtin-java classes where applicable
2016-03-09 18:04:13,331 PST [DEBUG] ec2-52-3-248-202.compute-1.ama [main] 
o.a.hadoop.util.PerformanceAdvisory - Falling back to shell based
2016-03-09 18:04:13,332 PST [DEBUG] ec2-52-3-248-202.compute-1.ama [main] 
o.a.h.s.JniBasedUnixGroupsMappingWithFallback - Group mapping 
impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping
2016-03-09 18:04:13,462 PST [DEBUG] ec2-52-3-248-202.compute-1.ama [main] 
org.apache.hadoop.security.Groups - Group mapping 
impl=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback; 
cacheTimeout=300000; warningDeltaMs=5000
2016-03-09 18:04:13,469 PST [DEBUG] ec2-52-3-248-202.compute-1.ama [main] 
o.a.h.security.UserGroupInformation - hadoop login
2016-03-09 18:04:13,470 PST [DEBUG] ec2-52-3-248-202.compute-1.ama [main] 
o.a.h.security.UserGroupInformation - hadoop login commit
2016-03-09 18:04:13,474 PST [DEBUG] ec2-52-3-248-202.compute-1.ama [main] 
o.a.h.security.UserGroupInformation - using local user:UnixPrincipal: root
2016-03-09 18:04:13,476 PST [DEBUG] ec2-52-3-248-202.compute-1.ama [main] 
o.a.h.security.UserGroupInformation - Using user: "UnixPrincipal: root" with 
name root
2016-03-09 18:04:13,476 PST [DEBUG] ec2-52-3-248-202.compute-1.ama [main] 
o.a.h.security.UserGroupInformation - User entry: "root"
2016-03-09 18:04:13,477 PST [DEBUG] ec2-52-3-248-202.compute-1.ama [main] 
o.a.h.security.UserGroupInformation - UGI loginUser:root (auth:SIMPLE)
2016-03-09 18:04:13,478 PST [INFO]  ec2-52-3-248-202.compute-1.ama [main] 
o.a.f.runtime.jobmanager

Reply via email to