Piyush Narang created FLINK-17443:
-------------------------------------

             Summary: Flink's ZK in HA mode setup is unable to start up if any 
of the zk hosts are unreachable
                 Key: FLINK-17443
                 URL: https://issues.apache.org/jira/browse/FLINK-17443
             Project: Flink
          Issue Type: Bug
            Reporter: Piyush Narang


We occasionally hit an issue where our Flink cluster will not startup if any of 
the zookeeper hosts passed in the "high-availability.zookeeper.quorum" config 
setting are unreachable. This seems to stem from us using an older zookeeper 
dependency version (3.4.10). 
Sample error we see is shown below.

This error seems to stem from us being on an older zookeeper release (3.4.10). 
This has been fixed as part of: 
https://issues.apache.org/jira/browse/ZOOKEEPER-1576 in the 3.4.x branch 
([https://github.com/apache/zookeeper/commit/be1409cc9a14ac2e28693e0e02a0ba6d9713565e]).
 
{code:java}
java.net.UnknownHostException: zk01-pa4.hpc.criteo.prod: Name or service not 
knownjava.net.UnknownHostException: zk01-pa4.hpc.criteo.prod: Name or service 
not known at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) at 
java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929) at 
java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324) at 
java.net.InetAddress.getAllByName0(InetAddress.java:1277) at 
java.net.InetAddress.getAllByName(InetAddress.java:1193) at 
java.net.InetAddress.getAllByName(InetAddress.java:1127) at 
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.client.StaticHostProvider.<init>(StaticHostProvider.java:61)
  at 
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:445)
 at 
org.apache.flink.shaded.curator.org.apache.curator.utils.DefaultZookeeperFactory.newZooKeeper(DefaultZookeeperFactory.java:29)
 at 
org.apache.flink.shaded.curator.org.apache.curator.framework.imps.CuratorFrameworkImpl$2.newZooKeeper(CuratorFrameworkImpl.java:150)
 at 
org.apache.flink.shaded.curator.org.apache.curator.HandleHolder$1.getZooKeeper(HandleHolder.java:94)
 at 
org.apache.flink.shaded.curator.org.apache.curator.HandleHolder.getZooKeeper(HandleHolder.java:55)
 at 
org.apache.flink.shaded.curator.org.apache.curator.ConnectionState.reset(ConnectionState.java:262)
 at 
org.apache.flink.shaded.curator.org.apache.curator.ConnectionState.start(ConnectionState.java:109)
 at 
org.apache.flink.shaded.curator.org.apache.curator.CuratorZookeeperClient.start(CuratorZookeeperClient.java:191)
 at 
org.apache.flink.shaded.curator.org.apache.curator.framework.imps.CuratorFrameworkImpl.start(CuratorFrameworkImpl.java:259)
 at 
org.apache.flink.runtime.util.ZooKeeperUtils.startCuratorFramework(ZooKeeperUtils.java:131)
 at 
org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createHighAvailabilityServices(HighAvailabilityServicesUtils.java:123)
 at 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint.createHaServices(ClusterEntrypoint.java:292)
 at 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint.initializeServices(ClusterEntrypoint.java:257){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to