[ https://issues.apache.org/jira/browse/FLINK-17443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17095551#comment-17095551 ]
Piyush Narang commented on FLINK-17443: --------------------------------------- Thanks [~chesnay], I'd like to take a stab at that as we do need to workaround this issue on our end. So I can pick up the ticket and give it a shot if you could assign it to me. > Flink's ZK in HA mode setup is unable to start up if any of the zk hosts are > unreachable > ---------------------------------------------------------------------------------------- > > Key: FLINK-17443 > URL: https://issues.apache.org/jira/browse/FLINK-17443 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination > Reporter: Piyush Narang > Priority: Major > Labels: pull-request-available > > We occasionally hit an issue where our Flink cluster will not startup if any > of the zookeeper hosts passed in the "high-availability.zookeeper.quorum" > config setting are unreachable. This seems to stem from us using an older > zookeeper dependency version (3.4.10). > Sample error we see is shown below. > This error seems to stem from us being on an older zookeeper release > (3.4.10). This has been fixed as part of: > https://issues.apache.org/jira/browse/ZOOKEEPER-1576 in the 3.4.x branch > ([https://github.com/apache/zookeeper/commit/be1409cc9a14ac2e28693e0e02a0ba6d9713565e]). > > {code:java} > java.net.UnknownHostException: zk01-pa4.hpc.criteo.prod: Name or service not > knownjava.net.UnknownHostException: zk01-pa4.hpc.criteo.prod: Name or service > not known at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) at > java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929) at > java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324) at > java.net.InetAddress.getAllByName0(InetAddress.java:1277) at > java.net.InetAddress.getAllByName(InetAddress.java:1193) at > java.net.InetAddress.getAllByName(InetAddress.java:1127) at > org.apache.flink.shaded.zookeeper.org.apache.zookeeper.client.StaticHostProvider.<init>(StaticHostProvider.java:61) > at > org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:445) > at > org.apache.flink.shaded.curator.org.apache.curator.utils.DefaultZookeeperFactory.newZooKeeper(DefaultZookeeperFactory.java:29) > at > org.apache.flink.shaded.curator.org.apache.curator.framework.imps.CuratorFrameworkImpl$2.newZooKeeper(CuratorFrameworkImpl.java:150) > at > org.apache.flink.shaded.curator.org.apache.curator.HandleHolder$1.getZooKeeper(HandleHolder.java:94) > at > org.apache.flink.shaded.curator.org.apache.curator.HandleHolder.getZooKeeper(HandleHolder.java:55) > at > org.apache.flink.shaded.curator.org.apache.curator.ConnectionState.reset(ConnectionState.java:262) > at > org.apache.flink.shaded.curator.org.apache.curator.ConnectionState.start(ConnectionState.java:109) > at > org.apache.flink.shaded.curator.org.apache.curator.CuratorZookeeperClient.start(CuratorZookeeperClient.java:191) > at > org.apache.flink.shaded.curator.org.apache.curator.framework.imps.CuratorFrameworkImpl.start(CuratorFrameworkImpl.java:259) > at > org.apache.flink.runtime.util.ZooKeeperUtils.startCuratorFramework(ZooKeeperUtils.java:131) > at > org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createHighAvailabilityServices(HighAvailabilityServicesUtils.java:123) > at > org.apache.flink.runtime.entrypoint.ClusterEntrypoint.createHaServices(ClusterEntrypoint.java:292) > at > org.apache.flink.runtime.entrypoint.ClusterEntrypoint.initializeServices(ClusterEntrypoint.java:257){code} -- This message was sent by Atlassian Jira (v8.3.4#803005)