I think I may be getting somewhere with this. I have opened the blob.server.port and the query.server.port on the TaskManager service, and I can now connect to JobManager from nodes in the same subnet.
However, nodes that are located in different clouds don't seem to be able to resolve the 'jobmanager' host by name: ubuntu@osdc-swarm-worker-1:~$ sudo docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES e6a26caf81b4 flink:latest "/docker-entrypoin..." 21 seconds ago Up 1 second 6123/tcp, 8081/tcp taskmanager.7.k0nc3tb7pxv4ppfuaxg155ku5 e12a280860a7 flink:latest "/docker-entrypoin..." 21 seconds ago Up 2 seconds 6123/tcp, 8081/tcp taskmanager.8.si7f8wk132jn9z5hwbx568nbj b459162a8ef6 flink:latest "/docker-entrypoin..." 22 seconds ago Up 5 seconds 6123/tcp, 8081/tcp taskmanager.3.x2s45mt0qyx2eucirxwj0wmyx ubuntu@osdc-swarm-worker-1:~$ sudo docker logs e12a280860a7 Starting Task Manager config file: jobmanager.rpc.address: jobmanager jobmanager.rpc.port: 6123 jobmanager.heap.mb: 1024 taskmanager.heap.mb: 1024 taskmanager.numberOfTaskSlots: 2 taskmanager.memory.preallocate: false parallelism.default: 1 jobmanager.web.port: 8081 blob.server.port: 6124 query.server.port: 6125 Starting taskmanager as a console application on host e12a280860a7. 2017-11-02 18:46:35,481 WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2017-11-02 18:46:35,744 INFO org.apache.flink.runtime.taskmanager.TaskManager - -------------------------------------------------------------------------------- 2017-11-02 18:46:35,744 INFO org.apache.flink.runtime.taskmanager.TaskManager - Starting TaskManager (Version: 1.3.2, Rev:0399bee, Date:03.08.2017 @ 10:23:11 UTC) 2017-11-02 18:46:35,744 INFO org.apache.flink.runtime.taskmanager.TaskManager - Current user: flink 2017-11-02 18:46:35,744 INFO org.apache.flink.runtime.taskmanager.TaskManager - JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - 1.8/25.141-b15 2017-11-02 18:46:35,745 INFO org.apache.flink.runtime.taskmanager.TaskManager - Maximum heap size: 1024 MiBytes 2017-11-02 18:46:35,745 INFO org.apache.flink.runtime.taskmanager.TaskManager - JAVA_HOME: /docker-java-home/jre 2017-11-02 18:46:35,752 INFO org.apache.flink.runtime.taskmanager.TaskManager - Hadoop version: 2.7.2 2017-11-02 18:46:35,753 INFO org.apache.flink.runtime.taskmanager.TaskManager - JVM Options: 2017-11-02 18:46:35,753 INFO org.apache.flink.runtime.taskmanager.TaskManager - -XX:+UseG1GC 2017-11-02 18:46:35,753 INFO org.apache.flink.runtime.taskmanager.TaskManager - -Xms1024M 2017-11-02 18:46:35,753 INFO org.apache.flink.runtime.taskmanager.TaskManager - -Xmx1024M 2017-11-02 18:46:35,753 INFO org.apache.flink.runtime.taskmanager.TaskManager - -XX:MaxDirectMemorySize=8388607T 2017-11-02 18:46:35,753 INFO org.apache.flink.runtime.taskmanager.TaskManager - -Dlog4j.configuration=file:/opt/flink/conf/log4j-console.properties 2017-11-02 18:46:35,753 INFO org.apache.flink.runtime.taskmanager.TaskManager - -Dlogback.configurationFile=file:/opt/flink/conf/logback-console.xml 2017-11-02 18:46:35,753 INFO org.apache.flink.runtime.taskmanager.TaskManager - Program Arguments: 2017-11-02 18:46:35,753 INFO org.apache.flink.runtime.taskmanager.TaskManager - --configDir 2017-11-02 18:46:35,754 INFO org.apache.flink.runtime.taskmanager.TaskManager - /opt/flink/conf 2017-11-02 18:46:35,754 INFO org.apache.flink.runtime.taskmanager.TaskManager - Classpath: /opt/flink/lib/flink-python_2.11-1.3.2.jar:/opt/flink/lib/flink-shaded-hadoop2-uber-1.3.2.jar:/opt/flink/lib/log4j-1.2.17.jar:/opt/flink/lib/slf4j-log4j12-1.7.7.jar:/opt/flink/lib/flink-dist_2.11-1.3.2.jar::: 2017-11-02 18:46:35,754 INFO org.apache.flink.runtime.taskmanager.TaskManager - -------------------------------------------------------------------------------- 2017-11-02 18:46:35,757 INFO org.apache.flink.runtime.taskmanager.TaskManager - Registered UNIX signal handlers for [TERM, HUP, INT] 2017-11-02 18:46:35,781 INFO org.apache.flink.runtime.taskmanager.TaskManager - Maximum number of open file descriptors is 1048576 2017-11-02 18:46:35,834 INFO org.apache.flink.runtime.taskmanager.TaskManager - Loading configuration from /opt/flink/conf 2017-11-02 18:46:35,843 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, jobmanager 2017-11-02 18:46:35,843 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123 2017-11-02 18:46:35,844 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.mb, 1024 2017-11-02 18:46:35,844 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.mb, 1024 2017-11-02 18:46:35,844 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 2 2017-11-02 18:46:35,845 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.memory.preallocate, false 2017-11-02 18:46:35,845 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 1 2017-11-02 18:46:35,845 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.web.port, 8081 2017-11-02 18:46:35,847 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: blob.server.port, 6124 2017-11-02 18:46:35,847 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: query.server.port, 6125 2017-11-02 18:46:35,865 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, jobmanager 2017-11-02 18:46:35,865 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123 2017-11-02 18:46:35,865 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.mb, 1024 2017-11-02 18:46:35,866 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.mb, 1024 2017-11-02 18:46:35,866 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 2 2017-11-02 18:46:35,866 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.memory.preallocate, false 2017-11-02 18:46:35,867 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 1 2017-11-02 18:46:35,867 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.web.port, 8081 2017-11-02 18:46:35,869 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: blob.server.port, 6124 2017-11-02 18:46:35,869 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: query.server.port, 6125 2017-11-02 18:46:35,936 INFO org.apache.flink.runtime.security.modules.HadoopModule - Hadoop user set to flink (auth:SIMPLE) 2017-11-02 18:46:36,522 ERROR org.apache.flink.runtime.taskmanager.TaskManager - Failed to run TaskManager. java.net.UnknownHostException: jobmanager: Name or service not known at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method) at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928) at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323) at java.net.InetAddress.getAllByName0(InetAddress.java:1276) at java.net.InetAddress.getAllByName(InetAddress.java:1192) at java.net.InetAddress.getAllByName(InetAddress.java:1126) at java.net.InetAddress.getByName(InetAddress.java:1076) at org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils.getRpcUrl(AkkaRpcServiceUtils.java:173) at org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils.getRpcUrl(AkkaRpcServiceUtils.java:138) at org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createHighAvailabilityServices(HighAvailabilityServicesUtils.java:78) at org.apache.flink.runtime.taskmanager.TaskManager$.selectNetworkInterfaceAndRunTaskManager(TaskManager.scala:1663) at org.apache.flink.runtime.taskmanager.TaskManager$$anon$2.call(TaskManager.scala:1574) at org.apache.flink.runtime.taskmanager.TaskManager$$anon$2.call(TaskManager.scala:1572) at org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40) at org.apache.flink.runtime.taskmanager.TaskManager$.main(TaskManager.scala:1572) at org.apache.flink.runtime.taskmanager.TaskManager.main(TaskManager.scala) Try the new Yahoo Mail ________________________________ From: Vergilio, Thalita Sent: 02 November 2017 18:31:45 To: Piotr Nowojski Cc: user@flink.apache.org Subject: Re: Docker-Flink Project: TaskManagers can't talk to JobManager if they are on different nodes Hi Piotr, Thank you very much for your reply. Yes, I have tried to open these ports when I create the services. If I create them with: docker service create --name jobmanager --env JOB_MANAGER_RPC_ADDRESS=jobmanager -p 8081:8081 -p 6123:6123 -p 48081:48081 --network overlay --constraint 'node.hostname == ubuntu-swarm-manager' flink jobmanager docker service create --name taskmanager --env JOB_MANAGER_RPC_ADDRESS=jobmanager -p 6121:6121 -p 6122:6122 --network overlay --constraint 'node.hostname != ubuntu-swarm-manager' flink taskmanager I still get the same issue. Thank you very much for taking your time to look at this. Best wishes, Thalita ________________________________ From: Piotr Nowojski <pi...@data-artisans.com> Sent: 02 November 2017 14:26:32 To: Vergilio, Thalita Cc: user@flink.apache.org Subject: Re: Docker-Flink Project: TaskManagers can't talk to JobManager if they are on different nodes Did you try to expose required ports that are listed in the README when starting the containers? https://github.com/apache/flink/tree/master/flink-contrib/docker-flink Ports: • The Web Client is on port 48081 • JobManager RPC port 6123 (default, not exposed to host) • TaskManagers RPC port 6122 (default, not exposed to host) • TaskManagers Data port 6121 (default, not exposed to host) Piotrek On 2 Nov 2017, at 14:44, javalass <t.vergilio4...@student.leedsbeckett.ac.uk<mailto:t.vergilio4...@student.leedsbeckett.ac.uk>> wrote: I am using the Docker-Flink project in: https://github.com/apache/flink/tree/master/flink-contrib/docker-flink I am creating the services with the following commands: docker network create -d overlay overlay docker service create --name jobmanager --env JOB_MANAGER_RPC_ADDRESS=jobmanager -p 8081:8081 --network overlay --constraint 'node.hostname == ubuntu-swarm-manager' flink jobmanager docker service create --name taskmanager --env JOB_MANAGER_RPC_ADDRESS=jobmanager --network overlay --constraint 'node.hostname != ubuntu-swarm-manager' flink taskmanager I wonder if there's any configuration I'm missing. This is the error I get: - Trying to register at JobManager akka.tcp://flink@jobmanager:6123/ user/jobmanager (attempt 4, timeout: 4000 milliseconds) -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ To view the terms under which this email is distributed, please go to:- http://disclaimer.leedsbeckett.ac.uk/disclaimer/disclaimer.html