[ 
https://issues.apache.org/jira/browse/KAFKA-12513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krzysztof Piecuch resolved KAFKA-12513.
---------------------------------------
    Resolution: Invalid

I've just read the docs, looks like everything is fine on kafka & zookeeper 
side.

 

sorry for the confusion.

> Kafka zookeeper client can't connect when the first zookeeper server is 
> offline
> -------------------------------------------------------------------------------
>
>                 Key: KAFKA-12513
>                 URL: https://issues.apache.org/jira/browse/KAFKA-12513
>             Project: Kafka
>          Issue Type: Bug
>          Components: zkclient
>    Affects Versions: 2.3.1, 2.4.1, 2.7.0
>         Environment: kafka_2.13-2.7.0, kernel 5.4.0-52-generic (Ubuntu), 
> Scala 2.13.3-400
>            Reporter: Krzysztof Piecuch
>            Priority: Critical
>
> Kafka zookeeper client library will not connect to any zookeepers in the 
> "zookeeper string" when the first zookeeper is offline. This causes the 
> cluster to crash hard and in order to get the cluster back into healthy state 
> the first zookeeper node must be resurrected.
> The crash does not always happen immediately after zk0 goes offline, because 
> kafka might have connections established to different zookeeper instances. 
> When the connection gets dropped and kafka needs to reconnect everything 
> crashes hard.
>  
> Demo:
> This works:
> {code:java}
>  root@kafka0:/opt/kafka/current/bin# ./kafka-topics.sh --zookeeper 
> zk0.gambit:2181/hex8c,zk1.gambit:2181/hex8c,zk2.gambit:2181/hex8c --describe  
> --topic duma
> Topic: duma   PartitionCount: 6       ReplicationFactor: 3    Configs: 
> compression.type=uncompressed,retention.bytes=322122547200
>       Topic: duma     Partition: 0    Leader: 1       Replicas: 1,0,2 Isr: 
> 1,0,2
>       Topic: duma     Partition: 1    Leader: 2       Replicas: 2,1,0 Isr: 
> 0,1,2
>       Topic: duma     Partition: 2    Leader: 0       Replicas: 0,2,1 Isr: 
> 0,1,2
>       Topic: duma     Partition: 3    Leader: 1       Replicas: 1,2,0 Isr: 
> 1,0,2
>       Topic: duma     Partition: 4    Leader: 2       Replicas: 2,0,1 Isr: 
> 1,0,2
>       Topic: duma     Partition: 5    Leader: 0       Replicas: 0,1,2 Isr: 
> 0,1,2
> {code}
> Now let's mess with the zookeeper string and see how zookeeper client reacts:
> Changing the last server in the zookeeper string works as expected, 
> {{kafka-topics.sh}} connected to zookeeper but couldn't find the topic 
> (because of bogus zookeeper string):
> {code:java}
> root@kafka0:/opt/kafka/current/bin# ./kafka-topics.sh --zookeeper 
> zk0.gambit:2181/hex8c,zk1.gambit:2181/hex8c,1.1.1.1:2181/hex8c --describe 
> --topic duma
> Error while executing topic command : Topic 'duma' does not exist as expected
> [2021-03-20 23:01:45,535] ERROR java.lang.IllegalArgumentException: Topic 
> 'duma' does not exist as expected
>       at 
> kafka.admin.TopicCommand$.kafka$admin$TopicCommand$$ensureTopicExists(TopicCommand.scala:484)
>       at 
> kafka.admin.TopicCommand$ZookeeperTopicService.describeTopic(TopicCommand.scala:390)
>       at kafka.admin.TopicCommand$.main(TopicCommand.scala:67)
>       at kafka.admin.TopicCommand.main(TopicCommand.scala)
>  (kafka.admin.TopicCommand$) {code}
> However, in case the first server in the zookeeper cluster is unavailable 
> zookeeper client won't connect to any of the zookeepers:
> {code:java}
> root@kafka0:/opt/kafka/current/bin# ./kafka-topics.sh --zookeeper 
> 1.1.1.1:2181/hex8c,zk1.gambit:2181/hex8c,zk2.gambit:2181/hex8c --describe 
> --topic duma
> [2021-03-20 23:02:43,888] WARN Client session timed out, have not heard from 
> server in 30012ms for sessionid 0x0 (org.apache.zookeeper.ClientCnxn)
> Exception in thread "main" kafka.zookeeper.ZooKeeperClientTimeoutException: 
> Timed out waiting for connection while in state: CONNECTING
>       at 
> kafka.zookeeper.ZooKeeperClient.$anonfun$waitUntilConnected$3(ZooKeeperClient.scala:259)
>       at 
> kafka.zookeeper.ZooKeeperClient$$Lambda$31.000000005D399170.apply$mcV$sp(Unknown
>  Source)
>       at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
>       at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253)
>       at 
> kafka.zookeeper.ZooKeeperClient.waitUntilConnected(ZooKeeperClient.scala:255)
>       at kafka.zookeeper.ZooKeeperClient.<init>(ZooKeeperClient.scala:113)
>       at kafka.zk.KafkaZkClient$.apply(KafkaZkClient.scala:1858)
>       at 
> kafka.admin.TopicCommand$ZookeeperTopicService$.apply(TopicCommand.scala:321)
>       at kafka.admin.TopicCommand$.main(TopicCommand.scala:54)
>       at kafka.admin.TopicCommand.main(TopicCommand.scala) {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to