Hi Maximilian, Thanks for your response. I will wait for the update. On Monday, March 14, 2016, Maximilian Michels <m...@apache.org> wrote:
> Hi Deepak, > > We'll look more into this problem this week. Until now we considered it a > configuration issue if the bind address was not externally reachable. > However, one might not always have the possibility to change this network > configuration. > > Looking further, it is actually possible to let the bind address be > different from the advertised address. From the Akka FAQ at > http://doc.akka.io/docs/akka/2.4.1/additional/faq.html: > > If you are running an ActorSystem under a NAT or inside a docker container, > > make sure to set akka.remote.netty.tcp.hostname and > > akka.remote.netty.tcp.port to the address it is reachable at from other > > ActorSystems. If you need to bind your network interface to a different > > address - use akka.remote.netty.tcp.bind-hostname and > > akka.remote.netty.tcp.bind-port settings. Also make sure your network is > > configured to translate from the address your ActorSystem is reachable at > > to the address your ActorSystem network interface is bound to. > > > > It looks like we have to expose this configuration to users who have a > special network setup. > > Best, > Max > > On Mon, Mar 14, 2016 at 5:42 AM, Deepak Jha <dkjhan...@gmail.com > <javascript:;>> wrote: > > > Hi Stephan & Ufuk, > > Thanks for your response. > > > > Yes there is a way in which you can run docker (net = host mode) in which > > guest machine's network stack gets shared by docker container. > > Unfortunately its not supported by AWS ECS. > > > > I do have one more question for you. Can you guys please explain me what > > happens when taskmanager's register themselves to jobmanager in HA mode? > > Does each taskmanager gets connected to jobmanager on separate port ? The > > reason I'm asking is because if I run 2 taskmanager's (on separate docker > > container), they are able to attach themselves to the Jobmanager (another > > docker container) ( Flink HA setup using remote zk cluster) but soon > after > > that they get disconnected. Logs are not very helpful either... I suspect > > that each taskmanager gets connected on new port and since by default > > docker does not expose all ports, this may happen.... I do not see this > > happen when I do not use docker container.... > > > > Here is the log file that I saw in jobmanager.... > > > > 2016-03-12 08:55:55,010 PST [INFO] ec2-54-173-231-120.compute-1.a > > [flink-akka.actor.default-dispatcher-20] > o.a.f.r.instance.InstanceManager - > > Registered TaskManager at 5673db03e679 (akka.tcp:// > > flink@172.17.0.3:6121/user/taskmanager) as > > 7eafcfddd6bd084f2ec5a32594603f4f. Current number of registered hosts > > is 1. *Current > > number of alive task slots is 1.* > > 2016-03-12 08:57:42,676 PST [INFO] ec2-54-173-231-120.compute-1.a > > [flink-akka.actor.default-dispatcher-20] > o.a.f.r.instance.InstanceManager - > > Registered TaskManager at 7200a7da4da7 (akka.tcp:// > > flink@172.17.0.3:6121/user/taskmanager) as > > 320338e15a7a44ee64dc03a40f04fcd7. Current number of registered hosts > > is 2. *Current > > number of alive task slots is 2.* > > 2016-03-12 08:57:48,422 PST [INFO] ec2-54-173-231-120.compute-1.a > > [flink-akka.actor.default-dispatcher-20] > > o.a.f.runtime.jobmanager.JobManager - Task manager akka.tcp:// > > flink@172.17.0.3:6121/user/taskmanager terminated. > > 2016-03-12 08:57:48,422 PST [INFO] ec2-54-173-231-120.compute-1.a > > [flink-akka.actor.default-dispatcher-20] o.a.f.r.instance.InstanceManager > > -* > > Unregistered task manager akka.tcp:// > > flink@172.17.0.3:6121/user/taskmanager > > <http://flink@172.17.0.3:6121/user/taskmanager>. Number of registered > task > > managers 1. Number of available slots 1.* > > 2016-03-12 08:58:01,417 PST [WARN] ec2-54-173-231-120.compute-1.a > > [flink-akka.actor.default-dispatcher-20] > > a.remote.ReliableDeliverySupervisor - Association with remote system > > [akka.tcp://flink@172.17.0.3:6121] has failed, address is now gated for > > [5000] ms. Reason is: [Disassociated]. > > 2016-03-12 08:58:01,451 PST [INFO] ec2-54-173-231-120.compute-1.a > > [flink-akka.actor.default-dispatcher-20] > > o.a.f.runtime.jobmanager.JobManager - Task manager akka.tcp:// > > flink@172.17.0.3:6121/user/taskmanager wants to disconnect, because > > TaskManager akka://flink/user/taskmanager is disassociating. > > 2016-03-12 08:58:01,451 PST [INFO] ec2-54-173-231-120.compute-1.a > > [flink-akka.actor.default-dispatcher-20] > > o.a.f.r.instance.InstanceManager - *Unregistered > > task manager akka.tcp://flink@172.17.0.3:6121/user/taskmanager > > <http://flink@172.17.0.3:6121/user/taskmanager>. Number of registered > task > > managers 0. Number of available slots 0.* > > 2016-03-12 08:58:01,465 PST [INFO] ec2-54-173-231-120.compute-1.a > > [flink-akka.actor.default-dispatcher-20] > > o.a.f.r.instance.InstanceManager - *Registered > > TaskManager at 7200a7da4da7 > > (akka.tcp://flink@172.17.0.3:6121/user/taskmanager > > <http://flink@172.17.0.3:6121/user/taskmanager>) as > > b5dbbc829854afa3ec5d8f0b6f9dbd03. Current number of registered hosts is > 1. > > Current number of alive task slots is 1.* > > 2016-03-12 08:58:03,383 PST [INFO] ec2-54-173-231-120.compute-1.a > > [flink-akka.actor.default-dispatcher-20] > > o.a.f.runtime.jobmanager.JobManager - Task manager akka.tcp:// > > flink@172.17.0.3:6121/user/taskmanager terminated. > > 2016-03-12 08:58:03,384 PST [INFO] ec2-54-173-231-120.compute-1.a > > [flink-akka.actor.default-dispatcher-20] o.a.f.r.instance.InstanceManager > > -* > > Unregistered task manager akka.tcp:// > > flink@172.17.0.3:6121/user/taskmanager > > <http://flink@172.17.0.3:6121/user/taskmanager>. Number of registered > task > > managers 0. Number of available slots 0.* > > 2016-03-12 08:58:04,988 PST [INFO] ec2-54-173-231-120.compute-1.a > > [flink-akka.actor.default-dispatcher-20] > o.a.f.r.instance.InstanceManager - > > Registering TaskManager at akka.tcp:// > > flink@172.17.0.3:6121/user/taskmanager > > which was marked as dead earlier because of a heart-beat timeout. > > 2016-03-12 08:58:04,988 PST [INFO] ec2-54-173-231-120.compute-1.a > > [flink-akka.actor.default-dispatcher-20] > o.a.f.r.instance.InstanceManager - > > Registered TaskManager at 7200a7da4da7 (akka.tcp:// > > flink@172.17.0.3:6121/user/taskmanager) as > > eac0ce12e6ec885863d3438d691f4ab2. Current number of registered hosts is > 1. > > Current number of alive task slots is 1. > > 2016-03-12 08:58:21,382 PST [WARN] ec2-54-173-231-120.compute-1.a > > [flink-akka.actor.default-dispatcher-20] > > a.remote.ReliableDeliverySupervisor - Association with remote system > > [akka.tcp://flink@172.17.0.3:6121] has failed, address is now gated for > > [5000] ms. Reason is: [Disassociated]. > > 2016-03-12 08:58:21,388 PST [INFO] ec2-54-173-231-120.compute-1.a > > [flink-akka.actor.default-dispatcher-20] > > o.a.f.runtime.jobmanager.JobManager - Task manager akka.tcp:// > > flink@172.17.0.3:6121/user/taskmanager wants to disconnect, because > > TaskManager akka://flink/user/taskmanager is disassociating. > > 2016-03-12 08:58:21,388 PST [INFO] ec2-54-173-231-120.compute-1.a > > [flink-akka.actor.default-dispatcher-20] > o.a.f.r.instance.InstanceManager - > > Unregistered task manager akka.tcp:// > > flink@172.17.0.3:6121/user/taskmanager. > > Number of registered task managers 0. Number of available slots 0. > > 2016-03-12 08:58:21,390 PST [INFO] ec2-54-173-231-120.compute-1.a > > [flink-akka.actor.default-dispatcher-20] > o.a.f.r.instance.InstanceManager - > > Registered TaskManager at 7200a7da4da7 (akka.tcp:// > > flink@172.17.0.3:6121/user/taskmanager) as > > bda61dbd047d40889aa3868d5d4d86a9. Current number of registered hosts is > 1. > > Current number of alive task slots is 1. > > 2016-03-12 08:58:25,433 PST [INFO] ec2-54-173-231-120.compute-1.a > > [flink-akka.actor.default-dispatcher-18] > > o.a.f.runtime.jobmanager.JobManager - Task manager akka.tcp:// > > flink@172.17.0.3:6121/user/taskmanager terminated. > > 2016-03-12 08:58:25,434 PST [INFO] ec2-54-173-231-120.compute-1.a > > [flink-akka.actor.default-dispatcher-18] > o.a.f.r.instance.InstanceManager - > > Unregistered task manager akka.tcp:// > > flink@172.17.0.3:6121/user/taskmanager. > > Number of registered task managers 0. Number of available slots 0. > > 2016-03-12 08:58:28,947 PST [INFO] ec2-54-173-231-120.compute-1.a > > [flink-akka.actor.default-dispatcher-20] > o.a.f.r.instance.InstanceManager - > > Registering TaskManager at akka.tcp:// > > flink@172.17.0.3:6121/user/taskmanager > > which was marked as dead earlier because of a heart-beat timeout. > > 2016-03-12 08:58:28,948 PST [INFO] ec2-54-173-231-120.compute-1.a > > [flink-akka.actor.default-dispatcher-20] > o.a.f.r.instance.InstanceManager - > > Registered TaskManager at 7200a7da4da7 (akka.tcp:// > > flink@172.17.0.3:6121/user/taskmanager) as > > d42ea5c6e0053935a0973d8536f3d8a5. Current number of registered hosts is > 1. > > Current number of alive task slots is 1. > > > > > > On Fri, Mar 11, 2016 at 5:23 AM, Stephan Ewen <se...@apache.org > <javascript:;>> wrote: > > > > > Hi Deepak! > > > > > > We can currently not split the bind address and advertised address, > > because > > > the Akka library only accepts packages sent explicitly to the bind > > address > > > (not sure why Akka has this artificial limitation, but it is there). > > > > > > Can you bridge the container IP address to be visible from the outside? > > > > > > Stephan > > > > > > > > > On Fri, Mar 11, 2016 at 1:03 PM, Ufuk Celebi <u...@apache.org > <javascript:;>> wrote: > > > > > > > Hey Deepak! > > > > > > > > Your description of Flink's behaviour is correct. To summarize: > > > > > > > > # Host Address > > > > > > > > If you specify a host address as an argument to the JVM (via > > > > jobmanager.sh or the start-cluster.sh scripts) then that one is used. > > > > If you don't, it falls back to the value configured in > flink-conf.yaml > > > > (what you describe). > > > > > > > > # Ports > > > > > > > > Default used random port and publishes via ZooKeeper. You can > > > > configure a port range only via recovery.jobmanager.port (what you > > > > describe). > > > > > > > > --- > > > > > > > > Your proposal would likely solve the issue, but isn't it possible to > > > > handle this outside of Flink? I've found this stack overflow > question, > > > > which should be related: > > > > > > > > > > > > > > http://stackoverflow.com/questions/26539727/giving-a-docker-container-a-routable-ip-address > > > > > > > > What's your opinion? > > > > > > > > > > > > > > > -- > > Thanks, > > Deepak Jha > > > -- Sent from Gmail Mobile