[jira] [Commented] (FLINK-2821) Change Akka configuration to allow accessing actors from different URLs

Philipp von dem Bussche (JIRA) Sat, 12 Nov 2016 02:21:10 -0800

    [ 
https://issues.apache.org/jira/browse/FLINK-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15659436#comment-15659436
 ]


Philipp von dem Bussche commented on FLINK-2821:
------------------------------------------------

[~mxm] sorry for my late reply but it is good that you have asked these 
questions as I think I need to optimize my setup here a bit. Whenever my test 
server goes down (which it does quite a bit lately) then my flink services 
won't come up without manual intervention and I think that is not good.
Since I have forgotten some of the details of my setup myself I am just going 
to outline them again below:

So I actually took the Dockerfile from the Flink contrib Github project as a 
basis.
In the Docker entrypoint I am doing this to set the JobManager listen address 
(I think this is still default):

{code}sed -i -e "s/jobmanager.rpc.address: localhost/jobmanager.rpc.address: 
`hostname -f`/g" $FLINK_HOME/conf/flink-conf.yaml{code}

So this actually leads to the JobManager doing this:

{code}Starting JobManager actor system at 172.17.0.23:6123{code}

The 172.x address would be the IP address coming from Docker.

On my TaskManager container I obviously can access this address and I am 
actually having an environment variable set for this on the TaskManager that 
points to this address. However this is of course not really dynamic, in fact I 
have about 20 or so containers on my test system and after the last reboot of 
the server the Docker IP address changed (it was actually .24 before). So then 
this whole setup breaks kind of.

Moving on to Rancher: you are able to define stacks (which is like a grouping 
for your containers). I have one stack for all my containers I need for doing 
data science things (well maybe thats a bit of overselling but anyways ;) ). So 
the name of the service (that in Rancher is kind of another wrapper around a 
container so you say you have a service and it is using Docker image X and then 
if you need more than lets say one you scale the service up and down etc. and 
service can run on different hosts etc.) representing the JobManager 
functionality is flink-jobmanager. Now with the Rancher DNS I can access the 
service (and since I only have one active container essentially the container) 
by just connecting to <protocol>:flink-jobmanager . This is when I am creating 
the connection from within the same stack. If I was on my application stack and 
want to access flink directly (I don't because it goes from the webservice into 
Kafka first which is already on the same stack) I could connect via 
<protocol>:flink-jobmanager.analyticsstack.
Now this is quite cool because I can leave out any of those references to hosts 
etc via environment variables or parameters because I can be pretty sure that 
my other services/containers are always resolvable. However the resolution is 
done against the Rancher IP and the one for the JobManager in my setup 
currently is 10.42.9.68.

So from my TaskManager container I can access all three of those IPs (the Host 
IP, the Docker IP and the Rancher IP) however I don't really want to go for the 
Host IP and the Docker IP because this would make things to static but when I 
have the JobManager bind on the Docker IP and try to connect to it via the 
Rancher IP then it complains. 
On the other hand I can't have the JobManager bind on the Rancher IP because 
that is not available inside the Container, it is something available in the 
Rancher context that then gets mapped/forwarded onto Docker and the 172.x 
address.

It seems I am currently not running the build where just patched the akka 
version but I remember I did for a while and it worked fine. I also think this 
would be the only way how this could work but I might be missing something. 
Thanks for looking into this !
    

> Change Akka configuration to allow accessing actors from different URLs
> -----------------------------------------------------------------------
>
>                 Key: FLINK-2821
>                 URL: https://issues.apache.org/jira/browse/FLINK-2821
>             Project: Flink
>          Issue Type: Bug
>          Components: Distributed Coordination
>            Reporter: Robert Metzger
>            Assignee: Maximilian Michels
>
> Akka expects the actor's URL to be exactly matching.
> As pointed out here, cases where users were complaining about this: 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Error-trying-to-access-JM-through-proxy-td3018.html
>   - Proxy routing (as described here, send to the proxy URL, receiver 
> recognizes only original URL)
>   - Using hostname / IP interchangeably does not work (we solved this by 
> always putting IP addresses into URLs, never hostnames)
>   - Binding to multiple interfaces (any local 0.0.0.0) does not work. Still 
> no solution to that (but seems not too much of a restriction)
> I am aware that this is not possible due to Akka, so it is actually not a 
> Flink bug. But I think we should track the resolution of the issue here 
> anyways because its affecting our user's satisfaction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2821) Change Akka configuration to allow accessing actors from different URLs

Reply via email to