Hi,

I'm having some troubles with Flink jobmanagers in a HA setup within OpenShift.

I have three jobmanagers, a Zookeeper cluster and a loadbalancer (Openshift/Kubernetes Route) for the web ui / rest server on the jobmanagers. Everything works fine, as long as the loadbalancer connects to the leader. However, when the leader changes and the loadbalancer connects to a non-leader, the jobmanager redirects to a leader using the ip address of the host. Since the routing in our network is done using hostnames, it doesn't know how to find the node using the ip address and results in a timeout.

So I have a few questions:
1. Why is Flink using the ip addresses instead of the hostname which are configured in the config? Other times it does use the hostname, like the info send to Zookeeper. 2. Is there another way of coping with connections to non-leaders instead of redirects? Maybe proxying through a non-leader to the leader?

Cheers,
Jeroen

Reply via email to