Hi Maximilian, Thanks for the email and looking into the issue. I'm using Scala 2.11 so it sounds perfect to me... I will be more than happy to test it out.
On Tue, Mar 22, 2016 at 2:48 AM, Maximilian Michels <m...@apache.org> wrote: > Hi Deepak, > > We have looked further into this and have a pretty easy fix. However, > it will only work with Flink's Scala 2.11 version because newer > versions of the Akka library are incompatible with Scala 2.10 (Flink's > default Scala version). Would that be a viable option for you? > > We're currently discussing this here: > https://issues.apache.org/jira/browse/FLINK-2821 > > Best, > Max > > On Mon, Mar 14, 2016 at 4:49 PM, Deepak Jha <dkjhan...@gmail.com> wrote: > > Hi Maximilian, > > Thanks for your response. I will wait for the update. > > > > On Monday, March 14, 2016, Maximilian Michels <m...@apache.org> wrote: > > > >> Hi Deepak, > >> > >> We'll look more into this problem this week. Until now we considered it > a > >> configuration issue if the bind address was not externally reachable. > >> However, one might not always have the possibility to change this > network > >> configuration. > >> > >> Looking further, it is actually possible to let the bind address be > >> different from the advertised address. From the Akka FAQ at > >> http://doc.akka.io/docs/akka/2.4.1/additional/faq.html: > >> > >> If you are running an ActorSystem under a NAT or inside a docker > container, > >> > make sure to set akka.remote.netty.tcp.hostname and > >> > akka.remote.netty.tcp.port to the address it is reachable at from > other > >> > ActorSystems. If you need to bind your network interface to a > different > >> > address - use akka.remote.netty.tcp.bind-hostname and > >> > akka.remote.netty.tcp.bind-port settings. Also make sure your network > is > >> > configured to translate from the address your ActorSystem is > reachable at > >> > to the address your ActorSystem network interface is bound to. > >> > > >> > >> It looks like we have to expose this configuration to users who have a > >> special network setup. > >> > >> Best, > >> Max > >> > >> On Mon, Mar 14, 2016 at 5:42 AM, Deepak Jha <dkjhan...@gmail.com > >> <javascript:;>> wrote: > >> > >> > Hi Stephan & Ufuk, > >> > Thanks for your response. > >> > > >> > Yes there is a way in which you can run docker (net = host mode) in > which > >> > guest machine's network stack gets shared by docker container. > >> > Unfortunately its not supported by AWS ECS. > >> > > >> > I do have one more question for you. Can you guys please explain me > what > >> > happens when taskmanager's register themselves to jobmanager in HA > mode? > >> > Does each taskmanager gets connected to jobmanager on separate port ? > The > >> > reason I'm asking is because if I run 2 taskmanager's (on separate > docker > >> > container), they are able to attach themselves to the Jobmanager > (another > >> > docker container) ( Flink HA setup using remote zk cluster) but soon > >> after > >> > that they get disconnected. Logs are not very helpful either... I > suspect > >> > that each taskmanager gets connected on new port and since by default > >> > docker does not expose all ports, this may happen.... I do not see > this > >> > happen when I do not use docker container.... > >> > > >> > Here is the log file that I saw in jobmanager.... > >> > > >> > 2016-03-12 08:55:55,010 PST [INFO] ec2-54-173-231-120.compute-1.a > >> > [flink-akka.actor.default-dispatcher-20] > >> o.a.f.r.instance.InstanceManager - > >> > Registered TaskManager at 5673db03e679 (akka.tcp:// > >> > flink@172.17.0.3:6121/user/taskmanager) as > >> > 7eafcfddd6bd084f2ec5a32594603f4f. Current number of registered hosts > >> > is 1. *Current > >> > number of alive task slots is 1.* > >> > 2016-03-12 08:57:42,676 PST [INFO] ec2-54-173-231-120.compute-1.a > >> > [flink-akka.actor.default-dispatcher-20] > >> o.a.f.r.instance.InstanceManager - > >> > Registered TaskManager at 7200a7da4da7 (akka.tcp:// > >> > flink@172.17.0.3:6121/user/taskmanager) as > >> > 320338e15a7a44ee64dc03a40f04fcd7. Current number of registered hosts > >> > is 2. *Current > >> > number of alive task slots is 2.* > >> > 2016-03-12 08:57:48,422 PST [INFO] ec2-54-173-231-120.compute-1.a > >> > [flink-akka.actor.default-dispatcher-20] > >> > o.a.f.runtime.jobmanager.JobManager - Task manager akka.tcp:// > >> > flink@172.17.0.3:6121/user/taskmanager terminated. > >> > 2016-03-12 08:57:48,422 PST [INFO] ec2-54-173-231-120.compute-1.a > >> > [flink-akka.actor.default-dispatcher-20] > o.a.f.r.instance.InstanceManager > >> > -* > >> > Unregistered task manager akka.tcp:// > >> > flink@172.17.0.3:6121/user/taskmanager > >> > <http://flink@172.17.0.3:6121/user/taskmanager>. Number of registered > >> task > >> > managers 1. Number of available slots 1.* > >> > 2016-03-12 08:58:01,417 PST [WARN] ec2-54-173-231-120.compute-1.a > >> > [flink-akka.actor.default-dispatcher-20] > >> > a.remote.ReliableDeliverySupervisor - Association with remote system > >> > [akka.tcp://flink@172.17.0.3:6121] has failed, address is now gated > for > >> > [5000] ms. Reason is: [Disassociated]. > >> > 2016-03-12 08:58:01,451 PST [INFO] ec2-54-173-231-120.compute-1.a > >> > [flink-akka.actor.default-dispatcher-20] > >> > o.a.f.runtime.jobmanager.JobManager - Task manager akka.tcp:// > >> > flink@172.17.0.3:6121/user/taskmanager wants to disconnect, because > >> > TaskManager akka://flink/user/taskmanager is disassociating. > >> > 2016-03-12 08:58:01,451 PST [INFO] ec2-54-173-231-120.compute-1.a > >> > [flink-akka.actor.default-dispatcher-20] > >> > o.a.f.r.instance.InstanceManager - *Unregistered > >> > task manager akka.tcp://flink@172.17.0.3:6121/user/taskmanager > >> > <http://flink@172.17.0.3:6121/user/taskmanager>. Number of registered > >> task > >> > managers 0. Number of available slots 0.* > >> > 2016-03-12 08:58:01,465 PST [INFO] ec2-54-173-231-120.compute-1.a > >> > [flink-akka.actor.default-dispatcher-20] > >> > o.a.f.r.instance.InstanceManager - *Registered > >> > TaskManager at 7200a7da4da7 > >> > (akka.tcp://flink@172.17.0.3:6121/user/taskmanager > >> > <http://flink@172.17.0.3:6121/user/taskmanager>) as > >> > b5dbbc829854afa3ec5d8f0b6f9dbd03. Current number of registered hosts > is > >> 1. > >> > Current number of alive task slots is 1.* > >> > 2016-03-12 08:58:03,383 PST [INFO] ec2-54-173-231-120.compute-1.a > >> > [flink-akka.actor.default-dispatcher-20] > >> > o.a.f.runtime.jobmanager.JobManager - Task manager akka.tcp:// > >> > flink@172.17.0.3:6121/user/taskmanager terminated. > >> > 2016-03-12 08:58:03,384 PST [INFO] ec2-54-173-231-120.compute-1.a > >> > [flink-akka.actor.default-dispatcher-20] > o.a.f.r.instance.InstanceManager > >> > -* > >> > Unregistered task manager akka.tcp:// > >> > flink@172.17.0.3:6121/user/taskmanager > >> > <http://flink@172.17.0.3:6121/user/taskmanager>. Number of registered > >> task > >> > managers 0. Number of available slots 0.* > >> > 2016-03-12 08:58:04,988 PST [INFO] ec2-54-173-231-120.compute-1.a > >> > [flink-akka.actor.default-dispatcher-20] > >> o.a.f.r.instance.InstanceManager - > >> > Registering TaskManager at akka.tcp:// > >> > flink@172.17.0.3:6121/user/taskmanager > >> > which was marked as dead earlier because of a heart-beat timeout. > >> > 2016-03-12 08:58:04,988 PST [INFO] ec2-54-173-231-120.compute-1.a > >> > [flink-akka.actor.default-dispatcher-20] > >> o.a.f.r.instance.InstanceManager - > >> > Registered TaskManager at 7200a7da4da7 (akka.tcp:// > >> > flink@172.17.0.3:6121/user/taskmanager) as > >> > eac0ce12e6ec885863d3438d691f4ab2. Current number of registered hosts > is > >> 1. > >> > Current number of alive task slots is 1. > >> > 2016-03-12 08:58:21,382 PST [WARN] ec2-54-173-231-120.compute-1.a > >> > [flink-akka.actor.default-dispatcher-20] > >> > a.remote.ReliableDeliverySupervisor - Association with remote system > >> > [akka.tcp://flink@172.17.0.3:6121] has failed, address is now gated > for > >> > [5000] ms. Reason is: [Disassociated]. > >> > 2016-03-12 08:58:21,388 PST [INFO] ec2-54-173-231-120.compute-1.a > >> > [flink-akka.actor.default-dispatcher-20] > >> > o.a.f.runtime.jobmanager.JobManager - Task manager akka.tcp:// > >> > flink@172.17.0.3:6121/user/taskmanager wants to disconnect, because > >> > TaskManager akka://flink/user/taskmanager is disassociating. > >> > 2016-03-12 08:58:21,388 PST [INFO] ec2-54-173-231-120.compute-1.a > >> > [flink-akka.actor.default-dispatcher-20] > >> o.a.f.r.instance.InstanceManager - > >> > Unregistered task manager akka.tcp:// > >> > flink@172.17.0.3:6121/user/taskmanager. > >> > Number of registered task managers 0. Number of available slots 0. > >> > 2016-03-12 08:58:21,390 PST [INFO] ec2-54-173-231-120.compute-1.a > >> > [flink-akka.actor.default-dispatcher-20] > >> o.a.f.r.instance.InstanceManager - > >> > Registered TaskManager at 7200a7da4da7 (akka.tcp:// > >> > flink@172.17.0.3:6121/user/taskmanager) as > >> > bda61dbd047d40889aa3868d5d4d86a9. Current number of registered hosts > is > >> 1. > >> > Current number of alive task slots is 1. > >> > 2016-03-12 08:58:25,433 PST [INFO] ec2-54-173-231-120.compute-1.a > >> > [flink-akka.actor.default-dispatcher-18] > >> > o.a.f.runtime.jobmanager.JobManager - Task manager akka.tcp:// > >> > flink@172.17.0.3:6121/user/taskmanager terminated. > >> > 2016-03-12 08:58:25,434 PST [INFO] ec2-54-173-231-120.compute-1.a > >> > [flink-akka.actor.default-dispatcher-18] > >> o.a.f.r.instance.InstanceManager - > >> > Unregistered task manager akka.tcp:// > >> > flink@172.17.0.3:6121/user/taskmanager. > >> > Number of registered task managers 0. Number of available slots 0. > >> > 2016-03-12 08:58:28,947 PST [INFO] ec2-54-173-231-120.compute-1.a > >> > [flink-akka.actor.default-dispatcher-20] > >> o.a.f.r.instance.InstanceManager - > >> > Registering TaskManager at akka.tcp:// > >> > flink@172.17.0.3:6121/user/taskmanager > >> > which was marked as dead earlier because of a heart-beat timeout. > >> > 2016-03-12 08:58:28,948 PST [INFO] ec2-54-173-231-120.compute-1.a > >> > [flink-akka.actor.default-dispatcher-20] > >> o.a.f.r.instance.InstanceManager - > >> > Registered TaskManager at 7200a7da4da7 (akka.tcp:// > >> > flink@172.17.0.3:6121/user/taskmanager) as > >> > d42ea5c6e0053935a0973d8536f3d8a5. Current number of registered hosts > is > >> 1. > >> > Current number of alive task slots is 1. > >> > > >> > > >> > On Fri, Mar 11, 2016 at 5:23 AM, Stephan Ewen <se...@apache.org > >> <javascript:;>> wrote: > >> > > >> > > Hi Deepak! > >> > > > >> > > We can currently not split the bind address and advertised address, > >> > because > >> > > the Akka library only accepts packages sent explicitly to the bind > >> > address > >> > > (not sure why Akka has this artificial limitation, but it is there). > >> > > > >> > > Can you bridge the container IP address to be visible from the > outside? > >> > > > >> > > Stephan > >> > > > >> > > > >> > > On Fri, Mar 11, 2016 at 1:03 PM, Ufuk Celebi <u...@apache.org > >> <javascript:;>> wrote: > >> > > > >> > > > Hey Deepak! > >> > > > > >> > > > Your description of Flink's behaviour is correct. To summarize: > >> > > > > >> > > > # Host Address > >> > > > > >> > > > If you specify a host address as an argument to the JVM (via > >> > > > jobmanager.sh or the start-cluster.sh scripts) then that one is > used. > >> > > > If you don't, it falls back to the value configured in > >> flink-conf.yaml > >> > > > (what you describe). > >> > > > > >> > > > # Ports > >> > > > > >> > > > Default used random port and publishes via ZooKeeper. You can > >> > > > configure a port range only via recovery.jobmanager.port (what you > >> > > > describe). > >> > > > > >> > > > --- > >> > > > > >> > > > Your proposal would likely solve the issue, but isn't it possible > to > >> > > > handle this outside of Flink? I've found this stack overflow > >> question, > >> > > > which should be related: > >> > > > > >> > > > > >> > > > >> > > >> > http://stackoverflow.com/questions/26539727/giving-a-docker-container-a-routable-ip-address > >> > > > > >> > > > What's your opinion? > >> > > > > >> > > > >> > > >> > > >> > > >> > -- > >> > Thanks, > >> > Deepak Jha > >> > > >> > > > > > > -- > > Sent from Gmail Mobile > -- Thanks, Deepak Jha