rondagostino opened a new pull request, #12856:
URL: https://github.com/apache/kafka/pull/12856

   KRaft brokers maintain their liveness in the cluster by sending 
BROKER_HEARTBEAT requests to the active controller; the active controller 
fences a broker if it doesn't receive a heartbeat request from that broker 
within the period defined by `broker.session.timeout.ms`. The broker should use 
a request timeout for its BROKER_HEARTBEAT requests that is not larger than the 
session timeout being used by the controller; doing so creates the possibility 
that upon controller failover the broker might fail to cancel an existing 
heartbeat request in time and then subsequently heartbeat to the new controller 
to maintain an uninterrupted session in the cluster. In other words, a failure 
of the active controller could result in under-replicated (or under-min ISR) 
partitions simply due to a delay in brokers heartbeating to the new controller.
   
   This patch adds documentation to that effect and sets the 
`controller.socket.timeout.ms` config accordingly in the quickstart files.  It 
also makes a change in `BrokerToControllerChannelManager.scala` to set the 
default request timeout to be equal to the value of 
`controller.socket.timeout.ms` rather than the generic `request.timeout.ms` -- 
but this default timeout value is not used by the 
BrokerToControllerChannelManager functionality, so this change is simply 
cosmetic at this time.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to