Just an update: by changing the JOB_MANAGER_RPC_ADDRESS to the public IP of the JobManager and exposing port 6123 as {{PUBLIC_IP}}:6123:6123, I manged to get the TaskManagers from different nodes and even different subnets to talk to the JobManager.
This is how I created the services: docker network create -d overlay overlay docker service create --name jobmanager --env JOB_MANAGER_RPC_ADDRESS={{PUBLIC_IP}} -p 8081:8081 -p{{PUBLIC_IP}}:6123:6123 -p 48081:48081 -p 6124:6124 -p 6125:6125 --network overlay --constraint 'node.hostname == ubuntu-swarm-manager' flink jobmanager docker service create --name taskmanager --env JOB_MANAGER_RPC_ADDRESS={{PUBLIC_IP}} -p 6121:6121 -p 6122:6122 --network overlay --constraint 'node.hostname != ubuntu-swarm-manager' flink taskmanager However, I am still encountering errors further down the line. When I submit a job using the Web UI, it fails because the JobManager can't talk to the TaskManager on port 35033. I presume this is the taskmanager.data.port, which needs to be set to a range and this range exposed when I create the service? Are there any other ports that I need to open at service creation time? Connecting the channel failed: Connecting to remote task manager + '/{{IP_ADDRESS_OF_MANAGER}}:35033' has failed. This might indicate that the remote task manager has been lost. at org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory$ConnectingChannel.waitForChannel(PartitionRequestClientFactory.java:196) at org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory$ConnectingChannel.access$000(PartitionRequestClientFactory.java:131) at org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory.createPartitionRequestClient(PartitionRequestClientFactory.java:83) at org.apache.flink.runtime.io.network.netty.NettyConnectionManager.createPartitionRequestClient(NettyConnectionManager.java:59) at org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.requestSubpartition(RemoteInputChannel.java:112) at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.requestPartitions(SingleInputGate.java:433) at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNextBufferOrEvent(SingleInputGate.java:455) at org.apache.flink.streaming.runtime.io.BarrierTracker.getNextNonBlocked(BarrierTracker.java:91) at org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:213) at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:69) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:263) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702) at java.lang.Thread.run(Thread.java:748) ________________________________ From: Piotr Nowojski <pi...@data-artisans.com> Sent: 02 November 2017 14:26:32 To: Vergilio, Thalita Cc: user@flink.apache.org Subject: Re: Docker-Flink Project: TaskManagers can't talk to JobManager if they are on different nodes Did you try to expose required ports that are listed in the README when starting the containers? https://github.com/apache/flink/tree/master/flink-contrib/docker-flink Ports: • The Web Client is on port 48081 • JobManager RPC port 6123 (default, not exposed to host) • TaskManagers RPC port 6122 (default, not exposed to host) • TaskManagers Data port 6121 (default, not exposed to host) Piotrek On 2 Nov 2017, at 14:44, javalass <t.vergilio4...@student.leedsbeckett.ac.uk<mailto:t.vergilio4...@student.leedsbeckett.ac.uk>> wrote: I am using the Docker-Flink project in: https://github.com/apache/flink/tree/master/flink-contrib/docker-flink I am creating the services with the following commands: docker network create -d overlay overlay docker service create --name jobmanager --env JOB_MANAGER_RPC_ADDRESS=jobmanager -p 8081:8081 --network overlay --constraint 'node.hostname == ubuntu-swarm-manager' flink jobmanager docker service create --name taskmanager --env JOB_MANAGER_RPC_ADDRESS=jobmanager --network overlay --constraint 'node.hostname != ubuntu-swarm-manager' flink taskmanager I wonder if there's any configuration I'm missing. This is the error I get: - Trying to register at JobManager akka.tcp://flink@jobmanager:6123/ user/jobmanager (attempt 4, timeout: 4000 milliseconds) -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ To view the terms under which this email is distributed, please go to:- http://disclaimer.leedsbeckett.ac.uk/disclaimer/disclaimer.html