Ah, it seems to be something with the custom flink client build that we
run...

Still dont know why but if I use the normal client once the job is started
it works.

Gyula

Gyula Fóra <gyula.f...@gmail.com> ezt írta (időpont: 2018. dec. 5., Sze,
9:50):

> I get the following error when trying to savepoint a job for example:
>
>  The program finished with the following exception:
>
> org.apache.flink.util.FlinkException: Could not connect to the leading
> JobManager. Please check that the JobManager is running.
> at
> org.apache.flink.client.program.ClusterClient.getJobManagerGateway(ClusterClient.java:960)
> at
> org.apache.flink.client.program.ClusterClient.triggerSavepoint(ClusterClient.java:737)
> at
> org.apache.flink.client.cli.CliFrontend.triggerSavepoint(CliFrontend.java:771)
> at
> org.apache.flink.client.cli.CliFrontend.lambda$checkpoint$10(CliFrontend.java:760)
> at
> org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:1044)
> at org.apache.flink.client.cli.CliFrontend.checkpoint(CliFrontend.java:759)
> at
> org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1127)
> at
> org.apache.flink.client.cli.CliFrontend.lambda$main$12(CliFrontend.java:1188)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
> at
> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
> at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1188)
> Caused by:
> org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Could
> not retrieve the leader gateway.
> at
> org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderGateway(LeaderRetrievalUtils.java:83)
> at
> org.apache.flink.client.program.ClusterClient.getJobManagerGateway(ClusterClient.java:955)
> ... 12 more
> Caused by: java.util.concurrent.TimeoutException: Futures timed out after
> [20000 milliseconds]
> at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
> at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
> at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
> at
> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
> at scala.concurrent.Await$.result(package.scala:190)
> at scala.concurrent.Await.result(package.scala)
> at
> org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderGateway(LeaderRetrievalUtils.java:81)
> ... 13 more
>
> No error when trying the same operation with the 1.7 client on an 1.6
> (legacy execution) job. This looks like a firewall issue so im trying to
> fix the ports to the open ranges but not sure what I have to change.
>
> Gyula
>
> Gyula Fóra <gyula.f...@gmail.com> ezt írta (időpont: 2018. dec. 4., K,
> 15:11):
>
>> Hi!
>>
>> We have been running Flink on Yarn for quite some time and historically
>> we specified port ranges so that the client can access the cluster:
>>
>> yarn.application-master.port: 100-200
>>
>> Now we updated to flink 1.7 and try to migrate away from the legacy
>> execution mode but we run into a problem that we cannot connect to the
>> running job from the command line client like before.
>>
>> What is the equivalent port config that would make sure that ports that
>> are needed to be accessible from the client land between 100 and 200?
>>
>> Thanks,
>> Gyula
>>
>

Reply via email to