Thalita Vergilio created FLINK-7961: ---------------------------------------
Summary: Docker-Flink with Docker Swarm doesn't work when machines are in different clouds Key: FLINK-7961 URL: https://issues.apache.org/jira/browse/FLINK-7961 Project: Flink Issue Type: Bug Components: Docker Affects Versions: 1.3.0 Reporter: Thalita Vergilio Priority: Normal Task Managers can't find Job Manager by name. Maybe some additional Docker configuration is needed? I am running the standard setup and create-docker-swarm-service.sh script from the Docker Flink project: https://github.com/apache/flink/blob/master/flink-contrib/docker-flink/create-docker-swarm-service.sh This is the log from one of the Task Manager's containers: {{Starting Task Manager config file: jobmanager.rpc.address: flink-jobmanager jobmanager.rpc.port: 6123 jobmanager.heap.mb: 1024 taskmanager.heap.mb: 1024 taskmanager.numberOfTaskSlots: 2 taskmanager.memory.preallocate: false parallelism.default: 1 jobmanager.web.port: 8081 blob.server.port: 6124 query.server.port: 6125 Starting taskmanager as a console application on host c42a6093f7bb. 2017-11-01 11:20:51,459 WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2017-11-01 11:20:51,522 INFO org.apache.flink.runtime.taskmanager.TaskManager - -------------------------------------------------------------------------------- 2017-11-01 11:20:51,522 INFO org.apache.flink.runtime.taskmanager.TaskManager - Starting TaskManager (Version: 1.3.2, Rev:0399bee, Date:03.08.2017 @ 10:23:11 UTC) 2017-11-01 11:20:51,522 INFO org.apache.flink.runtime.taskmanager.TaskManager - Current user: flink 2017-11-01 11:20:51,522 INFO org.apache.flink.runtime.taskmanager.TaskManager - JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - 1.8/25.141-b15 2017-11-01 11:20:51,522 INFO org.apache.flink.runtime.taskmanager.TaskManager - Maximum heap size: 1024 MiBytes 2017-11-01 11:20:51,522 INFO org.apache.flink.runtime.taskmanager.TaskManager - JAVA_HOME: /docker-java-home/jre 2017-11-01 11:20:51,526 INFO org.apache.flink.runtime.taskmanager.TaskManager - Hadoop version: 2.7.2 2017-11-01 11:20:51,526 INFO org.apache.flink.runtime.taskmanager.TaskManager - JVM Options: 2017-11-01 11:20:51,526 INFO org.apache.flink.runtime.taskmanager.TaskManager - -XX:+UseG1GC 2017-11-01 11:20:51,526 INFO org.apache.flink.runtime.taskmanager.TaskManager - -Xms1024M 2017-11-01 11:20:51,526 INFO org.apache.flink.runtime.taskmanager.TaskManager - -Xmx1024M 2017-11-01 11:20:51,526 INFO org.apache.flink.runtime.taskmanager.TaskManager - -XX:MaxDirectMemorySize=8388607T 2017-11-01 11:20:51,526 INFO org.apache.flink.runtime.taskmanager.TaskManager - -Dlog4j.configuration=file:/opt/flink/conf/log4j-console.properties 2017-11-01 11:20:51,526 INFO org.apache.flink.runtime.taskmanager.TaskManager - -Dlogback.configurationFile=file:/opt/flink/conf/logback-console.xml 2017-11-01 11:20:51,526 INFO org.apache.flink.runtime.taskmanager.TaskManager - Program Arguments: 2017-11-01 11:20:51,527 INFO org.apache.flink.runtime.taskmanager.TaskManager - --configDir 2017-11-01 11:20:51,527 INFO org.apache.flink.runtime.taskmanager.TaskManager - /opt/flink/conf 2017-11-01 11:20:51,527 INFO org.apache.flink.runtime.taskmanager.TaskManager - Classpath: /opt/flink/lib/flink-python_2.11-1.3.2.jar:/opt/flink/lib/flink-shaded-hadoop2-uber-1.3.2.jar:/opt/flink/lib/log4j-1.2.17.jar:/opt/flink/lib/slf4j-log4j12-1.7.7.jar:/opt/flink/lib/flink-dist_2.11-1.3.2.jar::: 2017-11-01 11:20:51,527 INFO org.apache.flink.runtime.taskmanager.TaskManager - -------------------------------------------------------------------------------- 2017-11-01 11:20:51,528 INFO org.apache.flink.runtime.taskmanager.TaskManager - Registered UNIX signal handlers for [TERM, HUP, INT] 2017-11-01 11:20:51,532 INFO org.apache.flink.runtime.taskmanager.TaskManager - Maximum number of open file descriptors is 1048576 2017-11-01 11:20:51,548 INFO org.apache.flink.runtime.taskmanager.TaskManager - Loading configuration from /opt/flink/conf 2017-11-01 11:20:51,551 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, flink-jobmanager 2017-11-01 11:20:51,551 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123 2017-11-01 11:20:51,551 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.mb, 1024 2017-11-01 11:20:51,551 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.mb, 1024 2017-11-01 11:20:51,551 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 2 2017-11-01 11:20:51,551 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.memory.preallocate, false 2017-11-01 11:20:51,552 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 1 2017-11-01 11:20:51,552 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.web.port, 8081 2017-11-01 11:20:51,552 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: blob.server.port, 6124 2017-11-01 11:20:51,553 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: query.server.port, 6125 2017-11-01 11:20:51,560 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, flink-jobmanager 2017-11-01 11:20:51,560 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123 2017-11-01 11:20:51,560 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.mb, 1024 2017-11-01 11:20:51,560 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.mb, 1024 2017-11-01 11:20:51,560 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 2 2017-11-01 11:20:51,560 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.memory.preallocate, false 2017-11-01 11:20:51,561 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 1 2017-11-01 11:20:51,561 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.web.port, 8081 2017-11-01 11:20:51,561 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: blob.server.port, 6124 2017-11-01 11:20:51,561 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: query.server.port, 6125 2017-11-01 11:20:51,585 INFO org.apache.flink.runtime.security.modules.HadoopModule - Hadoop user set to flink (auth:SIMPLE) 2017-11-01 11:20:51,621 ERROR org.apache.flink.runtime.taskmanager.TaskManager - Failed to run TaskManager. java.net.UnknownHostException: flink-jobmanager: Name or service not known at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method) at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928) at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323) at java.net.InetAddress.getAllByName0(InetAddress.java:1276) at java.net.InetAddress.getAllByName(InetAddress.java:1192) at java.net.InetAddress.getAllByName(InetAddress.java:1126) at java.net.InetAddress.getByName(InetAddress.java:1076) at org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils.getRpcUrl(AkkaRpcServiceUtils.java:173) at org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils.getRpcUrl(AkkaRpcServiceUtils.java:138) at org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createHighAvailabilityServices(HighAvailabilityServicesUtils.java:78) at org.apache.flink.runtime.taskmanager.TaskManager$.selectNetworkInterfaceAndRunTaskManager(TaskManager.scala:1663) at org.apache.flink.runtime.taskmanager.TaskManager$$anon$2.call(TaskManager.scala:1574) at org.apache.flink.runtime.taskmanager.TaskManager$$anon$2.call(TaskManager.scala:1572) at org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40) at org.apache.flink.runtime.taskmanager.TaskManager$.main(TaskManager.scala:1572) at org.apache.flink.runtime.taskmanager.TaskManager.main(TaskManager.scala)}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)