Hi everyone, I’ve got the following problem, when I’m trying to submit new job 
and if cluster has not enough resources, job submission fails with the 
following exception
But in YARN job hangs and wait’s for requested resources. When resources become 
available, job successfully runs.

What can I do to be sure that job startup is completed successfully or 
completely failed  ?

Thx.

The program finished with the following 
exception:\n\njava.lang.RuntimeException: Unable to tell application master to 
stop once the specified job has been finised\n\tat 
org.apache.flink.yarn.YarnClusterClient.stopAfterJob(YarnClusterClient.java:177)\n\tat
 
org.apache.flink.yarn.YarnClusterClient.submitJob(YarnClusterClient.java:201)\n\tat
 
org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:442)\n\tat 
org.apache.flink.client.program.DetachedEnvironment.finalizeExecute(DetachedEnvironment.java:76)\n\tat
 
org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:387)\n\tat 
org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:838)\n\tat 
org.apache.flink.client.CliFrontend.run(CliFrontend.java:259)\n\tat 
org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1086)\n\tat
 org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1133)\n\tat 
org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1130)\n\tat 
org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43)\n\tat
 java.security.AccessController.doPrivileged(Native Method)\n\tat 
javax.security.auth.Subject.doAs(Subject.java:422)\n\tat 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)\n\tat
 
org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40)\n\tat
 org.apache.flink.client.CliFrontend.main(CliFrontend.java:1129)\nCaused by: 
org.apache.flink.util.FlinkException: Could not connect to the leading 
JobManager. Please check that the JobManager is running.\n\tat 
org.apache.flink.client.program.ClusterClient.getJobManagerGateway(ClusterClient.java:789)\n\tat
 
org.apache.flink.yarn.YarnClusterClient.stopAfterJob(YarnClusterClient.java:171)\n\t...
 15 more\nCaused by: 
org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Could not 
retrieve the leader gateway.\n\tat 
org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderGateway(LeaderRetrievalUtils.java:79)\n\tat
 
org.apache.flink.client.program.ClusterClient.getJobManagerGateway(ClusterClient.java:784)\n\t...
 16 more\nCaused by: java.util.concurrent.TimeoutException: Futures timed out 
after [10000 milliseconds]\n\tat 
scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)\n\tat 
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)\n\tat 
scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)\n\tat 
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)\n\tat
 scala.concurrent.Await$.result(package.scala:190)\n\tat 
scala.concurrent.Await.result(package.scala)\n\tat 
org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderGateway(LeaderRetrievalUtils.java:77)\n\t...
 17 more", "stderr_lines": ["", 
"------------------------------------------------------------", " The program 
finished with the following exception:", "", "java.lang.RuntimeException: 
Unable to tell application master to stop once the specified job has been 
finised", "\tat 
org.apache.flink.yarn.YarnClusterClient.stopAfterJob(YarnClusterClient.java:177)",
 "\tat 
org.apache.flink.yarn.YarnClusterClient.submitJob(YarnClusterClient.java:201)", 
"\tat 
org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:442)", 
"\tat 
org.apache.flink.client.program.DetachedEnvironment.finalizeExecute(DetachedEnvironment.java:76)",
 "\tat 
org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:387)", 
"\tat 
org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:838)", 
"\tat org.apache.flink.client.CliFrontend.run(CliFrontend.java:259)", "\tat 
org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1086)", 
"\tat org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1133)", "\tat 
org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1130)", "\tat 
org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43)",
 "\tat java.security.AccessController.doPrivileged(Native Method)", "\tat 
javax.security.auth.Subject.doAs(Subject.java:422)", "\tat 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)",
 "\tat 
org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40)",
 "\tat org.apache.flink.client.CliFrontend.main(CliFrontend.java:1129)", 
"Caused by: org.apache.flink.util.FlinkException: Could not connect to the 
leading JobManager. Please check that the JobManager is running.", "\tat 
org.apache.flink.client.program.ClusterClient.getJobManagerGateway(ClusterClient.java:789)",
 "\tat 
org.apache.flink.yarn.YarnClusterClient.stopAfterJob(YarnClusterClient.java:171)",
 "\t... 15 more", "Caused by: 
org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Could not 
retrieve the leader gateway.", "\tat 
org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderGateway(LeaderRetrievalUtils.java:79)",
 "\tat 
org.apache.flink.client.program.ClusterClient.getJobManagerGateway(ClusterClient.java:784)",
 "\t... 16 more", "Caused by: java.util.concurrent.TimeoutException: Futures 
timed out after [10000 milliseconds]", "\tat 
scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)", "\tat 
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)", "\tat 
scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)", "\tat 
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)"

Reply via email to