[ https://issues.apache.org/jira/browse/KAFKA-12513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Krzysztof Piecuch resolved KAFKA-12513. --------------------------------------- Resolution: Invalid I've just read the docs, looks like everything is fine on kafka & zookeeper side. sorry for the confusion. > Kafka zookeeper client can't connect when the first zookeeper server is > offline > ------------------------------------------------------------------------------- > > Key: KAFKA-12513 > URL: https://issues.apache.org/jira/browse/KAFKA-12513 > Project: Kafka > Issue Type: Bug > Components: zkclient > Affects Versions: 2.3.1, 2.4.1, 2.7.0 > Environment: kafka_2.13-2.7.0, kernel 5.4.0-52-generic (Ubuntu), > Scala 2.13.3-400 > Reporter: Krzysztof Piecuch > Priority: Critical > > Kafka zookeeper client library will not connect to any zookeepers in the > "zookeeper string" when the first zookeeper is offline. This causes the > cluster to crash hard and in order to get the cluster back into healthy state > the first zookeeper node must be resurrected. > The crash does not always happen immediately after zk0 goes offline, because > kafka might have connections established to different zookeeper instances. > When the connection gets dropped and kafka needs to reconnect everything > crashes hard. > > Demo: > This works: > {code:java} > root@kafka0:/opt/kafka/current/bin# ./kafka-topics.sh --zookeeper > zk0.gambit:2181/hex8c,zk1.gambit:2181/hex8c,zk2.gambit:2181/hex8c --describe > --topic duma > Topic: duma PartitionCount: 6 ReplicationFactor: 3 Configs: > compression.type=uncompressed,retention.bytes=322122547200 > Topic: duma Partition: 0 Leader: 1 Replicas: 1,0,2 Isr: > 1,0,2 > Topic: duma Partition: 1 Leader: 2 Replicas: 2,1,0 Isr: > 0,1,2 > Topic: duma Partition: 2 Leader: 0 Replicas: 0,2,1 Isr: > 0,1,2 > Topic: duma Partition: 3 Leader: 1 Replicas: 1,2,0 Isr: > 1,0,2 > Topic: duma Partition: 4 Leader: 2 Replicas: 2,0,1 Isr: > 1,0,2 > Topic: duma Partition: 5 Leader: 0 Replicas: 0,1,2 Isr: > 0,1,2 > {code} > Now let's mess with the zookeeper string and see how zookeeper client reacts: > Changing the last server in the zookeeper string works as expected, > {{kafka-topics.sh}} connected to zookeeper but couldn't find the topic > (because of bogus zookeeper string): > {code:java} > root@kafka0:/opt/kafka/current/bin# ./kafka-topics.sh --zookeeper > zk0.gambit:2181/hex8c,zk1.gambit:2181/hex8c,1.1.1.1:2181/hex8c --describe > --topic duma > Error while executing topic command : Topic 'duma' does not exist as expected > [2021-03-20 23:01:45,535] ERROR java.lang.IllegalArgumentException: Topic > 'duma' does not exist as expected > at > kafka.admin.TopicCommand$.kafka$admin$TopicCommand$$ensureTopicExists(TopicCommand.scala:484) > at > kafka.admin.TopicCommand$ZookeeperTopicService.describeTopic(TopicCommand.scala:390) > at kafka.admin.TopicCommand$.main(TopicCommand.scala:67) > at kafka.admin.TopicCommand.main(TopicCommand.scala) > (kafka.admin.TopicCommand$) {code} > However, in case the first server in the zookeeper cluster is unavailable > zookeeper client won't connect to any of the zookeepers: > {code:java} > root@kafka0:/opt/kafka/current/bin# ./kafka-topics.sh --zookeeper > 1.1.1.1:2181/hex8c,zk1.gambit:2181/hex8c,zk2.gambit:2181/hex8c --describe > --topic duma > [2021-03-20 23:02:43,888] WARN Client session timed out, have not heard from > server in 30012ms for sessionid 0x0 (org.apache.zookeeper.ClientCnxn) > Exception in thread "main" kafka.zookeeper.ZooKeeperClientTimeoutException: > Timed out waiting for connection while in state: CONNECTING > at > kafka.zookeeper.ZooKeeperClient.$anonfun$waitUntilConnected$3(ZooKeeperClient.scala:259) > at > kafka.zookeeper.ZooKeeperClient$$Lambda$31.000000005D399170.apply$mcV$sp(Unknown > Source) > at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253) > at > kafka.zookeeper.ZooKeeperClient.waitUntilConnected(ZooKeeperClient.scala:255) > at kafka.zookeeper.ZooKeeperClient.<init>(ZooKeeperClient.scala:113) > at kafka.zk.KafkaZkClient$.apply(KafkaZkClient.scala:1858) > at > kafka.admin.TopicCommand$ZookeeperTopicService$.apply(TopicCommand.scala:321) > at kafka.admin.TopicCommand$.main(TopicCommand.scala:54) > at kafka.admin.TopicCommand.main(TopicCommand.scala) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)