[ https://issues.apache.org/jira/browse/FLINK-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15659436#comment-15659436 ]
Philipp von dem Bussche commented on FLINK-2821: ------------------------------------------------ [~mxm] sorry for my late reply but it is good that you have asked these questions as I think I need to optimize my setup here a bit. Whenever my test server goes down (which it does quite a bit lately) then my flink services won't come up without manual intervention and I think that is not good. Since I have forgotten some of the details of my setup myself I am just going to outline them again below: So I actually took the Dockerfile from the Flink contrib Github project as a basis. In the Docker entrypoint I am doing this to set the JobManager listen address (I think this is still default): {code}sed -i -e "s/jobmanager.rpc.address: localhost/jobmanager.rpc.address: `hostname -f`/g" $FLINK_HOME/conf/flink-conf.yaml{code} So this actually leads to the JobManager doing this: {code}Starting JobManager actor system at 172.17.0.23:6123{code} The 172.x address would be the IP address coming from Docker. On my TaskManager container I obviously can access this address and I am actually having an environment variable set for this on the TaskManager that points to this address. However this is of course not really dynamic, in fact I have about 20 or so containers on my test system and after the last reboot of the server the Docker IP address changed (it was actually .24 before). So then this whole setup breaks kind of. Moving on to Rancher: you are able to define stacks (which is like a grouping for your containers). I have one stack for all my containers I need for doing data science things (well maybe thats a bit of overselling but anyways ;) ). So the name of the service (that in Rancher is kind of another wrapper around a container so you say you have a service and it is using Docker image X and then if you need more than lets say one you scale the service up and down etc. and service can run on different hosts etc.) representing the JobManager functionality is flink-jobmanager. Now with the Rancher DNS I can access the service (and since I only have one active container essentially the container) by just connecting to <protocol>:flink-jobmanager . This is when I am creating the connection from within the same stack. If I was on my application stack and want to access flink directly (I don't because it goes from the webservice into Kafka first which is already on the same stack) I could connect via <protocol>:flink-jobmanager.analyticsstack. Now this is quite cool because I can leave out any of those references to hosts etc via environment variables or parameters because I can be pretty sure that my other services/containers are always resolvable. However the resolution is done against the Rancher IP and the one for the JobManager in my setup currently is 10.42.9.68. So from my TaskManager container I can access all three of those IPs (the Host IP, the Docker IP and the Rancher IP) however I don't really want to go for the Host IP and the Docker IP because this would make things to static but when I have the JobManager bind on the Docker IP and try to connect to it via the Rancher IP then it complains. On the other hand I can't have the JobManager bind on the Rancher IP because that is not available inside the Container, it is something available in the Rancher context that then gets mapped/forwarded onto Docker and the 172.x address. It seems I am currently not running the build where just patched the akka version but I remember I did for a while and it worked fine. I also think this would be the only way how this could work but I might be missing something. Thanks for looking into this ! > Change Akka configuration to allow accessing actors from different URLs > ----------------------------------------------------------------------- > > Key: FLINK-2821 > URL: https://issues.apache.org/jira/browse/FLINK-2821 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination > Reporter: Robert Metzger > Assignee: Maximilian Michels > > Akka expects the actor's URL to be exactly matching. > As pointed out here, cases where users were complaining about this: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Error-trying-to-access-JM-through-proxy-td3018.html > - Proxy routing (as described here, send to the proxy URL, receiver > recognizes only original URL) > - Using hostname / IP interchangeably does not work (we solved this by > always putting IP addresses into URLs, never hostnames) > - Binding to multiple interfaces (any local 0.0.0.0) does not work. Still > no solution to that (but seems not too much of a restriction) > I am aware that this is not possible due to Akka, so it is actually not a > Flink bug. But I think we should track the resolution of the issue here > anyways because its affecting our user's satisfaction. -- This message was sent by Atlassian JIRA (v6.3.4#6332)