Hi, I am running into an issue where the Kafka brokers (0.10.2.1) are getting removed from the Zookeeper (3.4.14). Here is the setup.
We have 3 Zookeeper nodes and 3 Kafka nodes in AWS. We are making use of auto-scaling group to get the replacement nodes on failures. When the Zookeeper and Kafka clusters are running, I can see the brokers registered in Zookeeper under /brokers/ids path. I then terminate the leader Zookeeper node and wait for AWS auto-scaling group to provide a replacement Zookeeper node. I then check /brokers/ids path to confirm if the brokers are still connected. I then terminate the second Zookeeper node and check for the path when a new Zookeeper node comes up. I don't have an issue till here. When I terminate the third Zookeeper node in the original list of Zookeeper nodes and I see that all the Kafka brokers' sessions are terminated and the brokers are removed from Zookeeper. The ids under /brokers/ids is empty. I can see the below logs in one of the Zookeeper nodes when the final Zookeeper node is replaced. 2020-03-26 20:29:20,303 [myid:3] - INFO [SessionTracker:ZooKeeperServer@355] - Expiring session 0x10003b973b50016, timeout of 6000ms exceeded 2020-03-26 20:29:20,303 [myid:3] - INFO [SessionTracker:ZooKeeperServer@355] - Expiring session 0x10003b973b5000e, timeout of 6000ms exceeded 2020-03-26 20:29:20,303 [myid:3] - INFO [SessionTracker:ZooKeeperServer@355] - Expiring session 0x30003a126690002, timeout of 6000ms exceeded 2020-03-26 20:29:20,307 [myid:3] - DEBUG [CommitProcessor:3:DataTree@893] - Deleting ephemeral node /brokers/ids/1002 for session 0x10003b973b50016 2020-03-26 20:29:20,307 [myid:3] - DEBUG [CommitProcessor:3:DataTree@893] - Deleting ephemeral node /brokers/ids/1003 for session 0x10003b973b5000e 2020-03-26 20:29:20,307 [myid:3] - DEBUG [CommitProcessor:3:DataTree@893] - Deleting ephemeral node /controller for session 0x30003a126690002 2020-03-26 20:29:20,307 [myid:3] - DEBUG [CommitProcessor:3:DataTree@893] - Deleting ephemeral node /brokers/ids/1001 for session 0x30003a126690002 I found a ticket KAFKA-5473<https://issues.apache.org/jira/browse/KAFKA-5473> within Kafka JIRA that talks about an issue with Zookeeper session expiration handling. I am not 100% sure that it is related to the current issue. But I do see the same behaviour where the DMC broker knows to renew the session with old Zookeeper nodes but not with the new replacement one. Can anyone help me with this? Thanks, Pradeep V.B. This email and any files transmitted with it are confidential, proprietary and intended solely for the individual or entity to whom they are addressed. If you have received this email in error please delete it immediately.