[ 
https://issues.apache.org/jira/browse/KAFKA-15844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

José Armando García Sancio resolved KAFKA-15844.
------------------------------------------------
    Resolution: Won't Fix

Marking it as won't fix since Kafka doesn't use ZK anymore.

> Broker doesn't re-register after losing ZK session
> --------------------------------------------------
>
>                 Key: KAFKA-15844
>                 URL: https://issues.apache.org/jira/browse/KAFKA-15844
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 3.1.2
>            Reporter: José Armando García Sancio
>            Priority: Major
>              Labels: zookeeper
>
> We experienced a case where a Kafka broker lost connection to the ZK cluster 
> and was not able to recreate the registration znode. Only, after the broker 
> was restarted did the registration znode get created.
> The interesting observation is that the "ACL authorizer" ZK client identified 
> the session lost and recreated the ZK client but the "Kafka server" ZK client 
> never received an SessionExpiredException exception.
> Here is an example session where this happened. The controller sees the 
> broker go offline:
> {code:java}
> INFO [Controller id=32] Newly added brokers: , deleted brokers: 37, bounced 
> brokers: , all live brokers: ...{code}
> "ACL authorizer" ZK session is lost and recreated in broker 37:
> {code:java}
> [Broker=37] WARN Client session timed out, have not heard from server in 
> 3026ms for sessionid 0x504b9c08b5e0025
> ...
> INFO [ZooKeeperClient ACL authorizer] Session expired.
> ...
> INFO [ZooKeeperClient ACL authorizer] Initializing a new session to ...
> ...
> [Broker=37] INFO Session establishment complete on server ..., sessionid = 
> 0x604dd0ad7180045, negotiated timeout = 18000{code}
> Unfortunately, we never see similar logs for the "Kafka server":
> {code:java}
> WARN Client session timed out, have not heard from server in 14227ms for 
> sessionid 0x304beeed4930026 (org.apache.zookeeper.ClientCnxn)
> ...
> INFO Client session timed out, have not heard from server in 14227ms for 
> sessionid 0x304beeed4930026, closing socket connection and attempting 
> reconnect (org.apache.zookeeper.ClientCnxn)
> ...
> WARN Client session timed out, have not heard from server in 4548ms for 
> sessionid 0x304beeed4930026 (org.apache.zookeeper.ClientCnxn)
> ...
> INFO Client session timed out, have not heard from server in 4548ms for 
> sessionid 0x304beeed4930026, closing socket connection and attempting 
> reconnect (org.apache.zookeeper.ClientCnxn){code}
> Maybe we are running into this issue from the ZOOKEEPER-1159 discussion:
> {quote}As I understand it, the problem here may be that a disconnected client 
> cannot discover that its session has expired. Only the server can declare a 
> session expired which on the client side leads to the 
> SessionExpiredException, but only when the client is connected.
> If this assumption is correct, I'm not sure how best to address it.
> {quote}
>  
> Restarting broker 37 resolved the issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to