Hello, We´re having lot of issues while trying to submit a job remotely using the Flink CLI command line tool. We have tried different configurations but in all of them we get errors from AKKA while trying to connect. I will try to summarise the configurations we´ve tried.
- Flink 1.3.0 deployed within a docker container on a Mesos cluster (using Marathon) - This flink has the property jobmanager.rpc.address as a hostname (i.e. kind of ip-XXXXXXXXX.eu<http://ip-XXXXXXXXX.eu>.west-1.compute.internal) - Use the same version for Flink Client remotely (e.g. in my laptop). When I try to submit the job using the command flink run -m myHostName:myPort (the same in jobmanager.rpc.address and jobmanager.rpc.port) after some time waiting I get the trace at the end of this email. In the flink side we get this error from AKKA: Association with remote system [akka.tcp://flink@10.203.23.24:24469] has failed, address is now gated for [5000] ms. Reason: [Association failed with [akka.tcp://flink@10.203.23.24:24469]] Caused by: [Connection refused: /10.203.23.24:24469] After reading a bit, it seems there´re some problems related to akka resolving hostnames to ips, so we decided to startup the same flink but changing jobmanager.rpc.address to have the direct ip (i.e. kind of XX.XXX.XX.XX). In this case I´m getting same trace (at the end of the email) from the client side and this one from the Flink server: Discard message LeaderSessionMessage(00000000-0000-0000-0000-000000000000,SubmitJob(JobGraph(jobId: b25d5c5ced962632abc5ee9ef867792e),DETACHED)) because the expected leader session ID b4f53899-5d70-467e-8e9d-e56eeb60b6e3 did not equal the received leader session ID 00000000-0000-0000-0000-000000000000. We have tried some other stuff but without success… any clue that could help us? Thanks in advance! org.apache.flink.client.program.ProgramInvocationException: The program execution failed: JobManager did not respond within 60000 milliseconds at org.apache.flink.client.program.ClusterClient.runDetached(ClusterClient.java:454) at org.apache.flink.client.program.StandaloneClusterClient.submitJob(StandaloneClusterClient.java:99) at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:400) at org.apache.flink.client.program.DetachedEnvironment.finalizeExecute(DetachedEnvironment.java:76) at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:345) at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:831) at org.apache.flink.client.CliFrontend.run(CliFrontend.java:256) at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1073) at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1120) at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1117) at org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40) at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1116) Caused by: org.apache.flink.runtime.client.JobTimeoutException: JobManager did not respond within 60000 milliseconds at org.apache.flink.runtime.client.JobClient.submitJobDetached(JobClient.java:426) at org.apache.flink.client.program.ClusterClient.runDetached(ClusterClient.java:451) ... 15 more Caused by: java.util.concurrent.TimeoutException: Futures timed out after [60000 milliseconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:190) at scala.concurrent.Await.result(package.scala) at org.apache.flink.runtime.client.JobClient.submitJobDetached(JobClient.java:423) ... 16 more This message is private and confidential. If you have received this message in error, please notify the sender or serviced...@piksel.com and remove it from your system. Piksel Inc is a company registered in the United States, 2100 Powers Ferry Road SE, Suite 400, Atlanta, GA 30339