[ https://issues.apache.org/jira/browse/KAFKA-5971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Santilli resolved KAFKA-5971. -------------------------------------- Resolution: Duplicate This is getting closed since KAFKA-7165 have been solved > Broker keeps running even though not registered in ZK > ----------------------------------------------------- > > Key: KAFKA-5971 > URL: https://issues.apache.org/jira/browse/KAFKA-5971 > Project: Kafka > Issue Type: Bug > Components: core > Affects Versions: 0.11.0.0 > Reporter: Igor Canadi > Priority: Major > > We had a curious situation happen to our kafka cluster running version > 0.11.0.0. One of the brokers was happily running, even though its ID was not > registered in Zookeeper under `/brokers/ids`. > Based on the logs, it appears that the broker restarted very quickly and > there was a node under `/brokers/ids/2` still present from the previous run. > However, in that case I'd expect the broker to try again or just exit. In > reality it continued running without any errors in the logs. > Here's the relevant part of the logs: > ``` > [2017-09-06 23:50:26,095] INFO Opening socket connection to server > zookeeper.kafka.svc.cluster.local/100.66.99.54:2181. Will not attempt to > authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn) > [2017-09-06 23:50:26,096] INFO Socket connection established to > zookeeper.kafka.svc.cluster.local/100.66.99.54:2181, initiating session > (org.apache.zookeeper.ClientCnxn) > [2017-09-06 23:50:26,099] WARN Unable to reconnect to ZooKeeper service, > session 0x15e4477405f1d40 has expired (org.apache.zookeeper.ClientCnxn) > [2017-09-06 23:50:26,099] INFO zookeeper state changed (Expired) > (org.I0Itec.zkclient.ZkClient) > [2017-09-06 23:50:26,099] INFO Unable to reconnect to ZooKeeper service, > session 0x15e4477405f1d40 has expired, closing socket connection > (org.apache.zookeeper.ClientCnxn) > [2017-09-06 23:50:26,099] INFO Initiating client connection, > connectString=zookeeper:2181 sessionTimeout=6000 > watcher=org.I0Itec.zkclient.ZkClient@2cb4893b (org.apache.zookeeper.ZooKeeper) > [2017-09-06 23:50:26,102] INFO EventThread shut down for session: > 0x15e4477405f1d40 (org.apache.zookeeper.ClientCnxn) > [2017-09-06 23:50:26,107] INFO Opening socket connection to server > zookeeper.kafka.svc.cluster.local/100.66.99.54:2181. Will not attempt to > authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn) > [2017-09-06 23:50:26,108] INFO Socket connection established to > zookeeper.kafka.svc.cluster.local/100.66.99.54:2181, initiating session > (org.apache.zookeeper.ClientCnxn) > [2017-09-06 23:50:26,111] INFO Session establishment complete on server > zookeeper.kafka.svc.cluster.local/100.66.99.54:2181, sessionid = > 0x15e599a1a3e0013, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn) > [2017-09-06 23:50:26,112] INFO zookeeper state changed (SyncConnected) > (org.I0Itec.zkclient.ZkClient) > [2017-09-06 23:50:26,114] INFO re-registering broker info in ZK for broker 2 > (kafka.server.KafkaHealthcheck$SessionExpireListener) > [2017-09-06 23:50:26,115] INFO Creating /brokers/ids/2 (is it secure? false) > (kafka.utils.ZKCheckedEphemeral) > [2017-09-06 23:50:26,123] INFO Result of znode creation is: NODEEXISTS > (kafka.utils.ZKCheckedEphemeral) > [2017-09-06 23:50:26,124] ERROR Error handling event ZkEvent[New session > event sent to kafka.server.KafkaHealthcheck$SessionExpireListener@699f40a0] > (org.I0Itec.zkclient.ZkEventThread) > java.lang.RuntimeException: A broker is already registered on the path > /brokers/ids/2. This probably indicates that you either have configured a > brokerid that is already in use, or else you have shutdown this broker and > restarted it faster than the zookeeper timeout so it > at kafka.utils.ZkUtils.registerBrokerInZk(ZkUtils.scala:417) > at kafka.utils.ZkUtils.registerBrokerInZk(ZkUtils.scala:403) > at kafka.server.KafkaHealthcheck.register(KafkaHealthcheck.scala:70) > at > kafka.server.KafkaHealthcheck$SessionExpireListener.handleNewSession(KafkaHealthcheck.scala:104) > at org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:736) > at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:72) > [2017-09-06 23:51:42,257] INFO [Group Metadata Manager on Broker 2]: Removed > 0 expired offsets in 0 milliseconds. > (kafka.coordinator.group.GroupMetadataManager) > [2017-09-07 00:00:06,198] INFO Unable to read additional data from server > sessionid 0x15e599a1a3e0013, likely server has closed socket, closing socket > connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) > [2017-09-07 00:00:06,354] INFO zookeeper state changed (Disconnected) > (org.I0Itec.zkclient.ZkClient) > [2017-09-07 00:00:07,675] INFO Opening socket connection to server > zookeeper.kafka.svc.cluster.local/100.66.99.54:2181. Will not attempt to > authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn) > [2017-09-07 00:00:07,676] INFO Socket connection established to > zookeeper.kafka.svc.cluster.local/100.66.99.54:2181, initiating session > (org.apache.zookeeper.ClientCnxn) > [2017-09-07 00:00:07,680] INFO Session establishment complete on server > zookeeper.kafka.svc.cluster.local/100.66.99.54:2181, sessionid = > 0x15e599a1a3e0013, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn) > [2017-09-07 00:00:07,681] INFO zookeeper state changed (SyncConnected) > (org.I0Itec.zkclient.ZkClient) > [2017-09-07 00:01:42,257] INFO [Group Metadata Manager on Broker 2]: Removed > 0 expired offsets in 0 milliseconds. > (kafka.coordinator.group.GroupMetadataManager) > [2017-09-07 00:11:42,257] INFO [Group Metadata Manager on Broker 2]: Removed > 0 expired offsets in 0 milliseconds. > (kafka.coordinator.group.GroupMetadataManager) > [2017-09-07 00:21:42,257] INFO [Group Metadata Manager on Broker 2]: Removed > 0 expired offsets in 0 milliseconds. > (kafka.coordinator.group.GroupMetadataManager) > [2017-09-07 00:31:42,257] INFO [Group Metadata Manager on Broker 2]: Removed > 0 expired offsets in 0 milliseconds. > (kafka.coordinator.group.GroupMetadataManager) > ``` > The only message that appears after this point is the "Removed 0 expired > offsets", which happens every 10min. > Let me know if I can provide any more information! -- This message was sent by Atlassian JIRA (v7.6.3#76005)