Re: Leader Retrieval Timeout with HA Job Manager

Timo Walther Tue, 15 May 2018 07:00:05 -0700

Can you change the log level to DEBUG and share the logs with us? MaybeTill (in CC) has some idea?


Regards,
Timo



Am 15.05.18 um 15:18 schrieb Jason Kania:

Hi Timo,

Thanks for the response.
Yes, we are running with a cloud provider, a cloud system provided byour national government for R&D purposes. The thing is that we alsohave Kafka and Cassandra on the same nodes and they have no issues inthis environment, it is just Flink in an HA configuration that hasproblems so it is strange.
Is there any additional logging available for analysis of these sortsof scenarios? The details in the current logs are insufficient to knowwhat is happening.
Thanks,

Jason
On Tuesday, May 15, 2018, 7:51:40 a.m. EDT, Timo Walther<twal...@apache.org> wrote:
Hi Jason,
this sounds more like a network connection/firewall issue to me. Canyou tell us a bit more about your environment? Are you running yourFlink cluster on a cloud provider?
Regards,
Timo


Am 15.05.18 um 05:15 schrieb Jason Kania:
Hi,
I am using the 1.4.2 release on ubuntu and attempting to make use ofan HA Job Manager, but unfortunately using HA functionality preventsjob submission with the following error:
java.lang.RuntimeException: Failed to retrieve JobManager address
atorg.apache.flink.client.program.ClusterClient.getJobManagerAddress(ClusterClient.java:308) atorg.apache.flink.client.program.StandaloneClusterClient.getClusterIdentifier(StandaloneClusterClient.java:86) atorg.apache.flink.client.CliFrontend.createClient(CliFrontend.java:921)
        at org.apache.flink.client.CliFrontend.run(CliFrontend.java:264)
atorg.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1054) atorg.apache.flink.client.CliFrontend$1.call(CliFrontend.java:1101) atorg.apache.flink.client.CliFrontend$1.call(CliFrontend.java:1098) atorg.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30) atorg.apache.flink.client.CliFrontend.main(CliFrontend.java:1098)Caused by:org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException:Could not retrieve the leader address and leader session ID. atorg.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderConnectionInfo(LeaderRetrievalUtils.java:113) atorg.apache.flink.client.program.ClusterClient.getJobManagerAddress(ClusterClient.java:302)
        ... 8 more
Caused by: java.util.concurrent.TimeoutException: Futures timed outafter [60000 milliseconds] atscala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223) atscala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227) atscala.concurrent.Await$$anonfun$result$1.apply(package.scala:190) atscala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
        at scala.concurrent.Await$.result(package.scala:190)
        at scala.concurrent.Await.result(package.scala)
atorg.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderConnectionInfo(LeaderRetrievalUtils.java:111)
        ... 9 more
This seems to also be tied to problems in having the TaskManagerregister. I have to repeatedly restart the TaskManager until itfinally connects to the Job Manager. Most times it doesn't connectand doesn't complain making the determination of the root cause moredifficult. The cluster is not busy and I have tried both with IPaddresses and host names to determine if name resolution issues werethe cause, but both situations are the same.
I have also noticed that if 2 job managers are launched on differentnodes in the same cluster, they both come back with loggingindicating that they are the leader so they are not talking to eachother effectively and the logging is not even indicating that theyare even attempting to talk with one another.
Lastly, the error "Could not retrieve the leader address and leadersession ID." is a very poor error because it does not tell where itis attempting to get the information from.
Any suggestions would be appreciated.

Thanks,

Jason

Re: Leader Retrieval Timeout with HA Job Manager

Reply via email to