[ https://issues.apache.org/jira/browse/KAFKA-15844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
José Armando García Sancio resolved KAFKA-15844. ------------------------------------------------ Resolution: Won't Fix Marking it as won't fix since Kafka doesn't use ZK anymore. > Broker doesn't re-register after losing ZK session > -------------------------------------------------- > > Key: KAFKA-15844 > URL: https://issues.apache.org/jira/browse/KAFKA-15844 > Project: Kafka > Issue Type: Bug > Affects Versions: 3.1.2 > Reporter: José Armando García Sancio > Priority: Major > Labels: zookeeper > > We experienced a case where a Kafka broker lost connection to the ZK cluster > and was not able to recreate the registration znode. Only, after the broker > was restarted did the registration znode get created. > The interesting observation is that the "ACL authorizer" ZK client identified > the session lost and recreated the ZK client but the "Kafka server" ZK client > never received an SessionExpiredException exception. > Here is an example session where this happened. The controller sees the > broker go offline: > {code:java} > INFO [Controller id=32] Newly added brokers: , deleted brokers: 37, bounced > brokers: , all live brokers: ...{code} > "ACL authorizer" ZK session is lost and recreated in broker 37: > {code:java} > [Broker=37] WARN Client session timed out, have not heard from server in > 3026ms for sessionid 0x504b9c08b5e0025 > ... > INFO [ZooKeeperClient ACL authorizer] Session expired. > ... > INFO [ZooKeeperClient ACL authorizer] Initializing a new session to ... > ... > [Broker=37] INFO Session establishment complete on server ..., sessionid = > 0x604dd0ad7180045, negotiated timeout = 18000{code} > Unfortunately, we never see similar logs for the "Kafka server": > {code:java} > WARN Client session timed out, have not heard from server in 14227ms for > sessionid 0x304beeed4930026 (org.apache.zookeeper.ClientCnxn) > ... > INFO Client session timed out, have not heard from server in 14227ms for > sessionid 0x304beeed4930026, closing socket connection and attempting > reconnect (org.apache.zookeeper.ClientCnxn) > ... > WARN Client session timed out, have not heard from server in 4548ms for > sessionid 0x304beeed4930026 (org.apache.zookeeper.ClientCnxn) > ... > INFO Client session timed out, have not heard from server in 4548ms for > sessionid 0x304beeed4930026, closing socket connection and attempting > reconnect (org.apache.zookeeper.ClientCnxn){code} > Maybe we are running into this issue from the ZOOKEEPER-1159 discussion: > {quote}As I understand it, the problem here may be that a disconnected client > cannot discover that its session has expired. Only the server can declare a > session expired which on the client side leads to the > SessionExpiredException, but only when the client is connected. > If this assumption is correct, I'm not sure how best to address it. > {quote} > > Restarting broker 37 resolved the issue. -- This message was sent by Atlassian Jira (v8.20.10#820010)