With the `zookeeper-shell.sh` script, I have checked the path of `/brokers/ids`, it showed only the broker id which was un-affected.
On Fri, Nov 24, 2017 at 12:49 PM, Kamal <kamal.chandraprak...@gmail.com> wrote: > Hi Kafka Users, > > In our production cluster, we have faced the below error in 2 out of 3 > brokers. After this error, the ISR are not updated and not able to create > new topics as the replication factor > is higher than the available brokers. > > The session between Kafka and Zookeeper got expired. During reconnect, the > below error occurred: > > *[2017-11-23 04:48:39,180] INFO Session establishment complete on server > x.x.x.76/x.x.x.76:10056, sessionid = 0x25fe606b0dd0000, negotiated timeout > = 20000 (org.apache.zookeeper.ClientCnxn* > *)* > *[2017-11-23 04:48:39,181] INFO zookeeper state changed (SyncConnected) > (org.I0Itec.zkclient.ZkClient)* > *[2017-11-23 04:48:39,183] INFO re-registering broker info in ZK for > broker 3 (kafka.server.KafkaHealthcheck$SessionExpireListener)* > *[2017-11-23 04:48:39,183] INFO Creating /brokers/ids/3 (is it secure? > false) (kafka.utils.ZKCheckedEphemeral)* > *[2017-11-23 04:48:39,186] INFO Result of znode creation is: NODEEXISTS > (kafka.utils.ZKCheckedEphemeral)* > *[2017-11-23 04:48:39,186] ERROR Error handling event ZkEvent[New session > event sent to kafka.server.KafkaHealthcheck$SessionExpireListener@58b411d0] > (org.I0Itec.zkclient.ZkEventThread)* > *java.lang.RuntimeException: A broker is already registered on the path > /brokers/ids/3. This probably indicates that you either have configured a > brokerid that is already in use, or else you have shutdown this broker and > restarted it faster than the zookeeper timeout so it appears to be > re-registering.* > * at kafka.utils.ZkUtils.registerBrokerInZk(ZkUtils.scala:408)* > * at kafka.utils.ZkUtils.registerBrokerInZk(ZkUtils.scala:394)* > * at > kafka.server.KafkaHealthcheck.register(KafkaHealthcheck.scala:71)* > * at > kafka.server.KafkaHealthcheck$SessionExpireListener.handleNewSession(KafkaHealthcheck.scala:105)* > * at org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:736)* > * at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:72)* > > > After this exception, all the ISR updates gets skipped. > > *[2017-11-23 04:48:39,340] INFO New leader is 3 > (kafka.server.ZookeeperLeaderElector$LeaderChangeListener)* > *[2017-11-23 04:49:00,008] INFO New leader is 2 > (kafka.server.ZookeeperLeaderElector$LeaderChangeListener)* > *[2017-11-23 04:49:12,774] INFO Partition > [CHANNEL_CLIENT_LISTENER_CHANGE,0] on broker 3: Cached zkVersion [2] not > equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition)* > > > Zookeeper is deployed in quorum mode (3 zk). In one of the Zookeeper, we > faced the below errors: (kafka-zk default session timeout: 20 s). > Other two Zk servers seems fine. > > https://pastebin.com/9YQABiTL > > Finally, we restarted the affected two brokers. We are using Kafka Version > - 0.10.2.1 and Zookeeper version - 3.4.9 > Does these session errors are fixed in the latest version (1.0.0) / What > are the pre-cautionary steps to take to avoid these errors ? > > Regards, > Kamal C > > > > > >