[ https://issues.apache.org/jira/browse/KAFKA-17793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chia-Ping Tsai resolved KAFKA-17793. ------------------------------------ Fix Version/s: 4.0.0 Resolution: Fixed > Improve kcontroller robustness against long delays > -------------------------------------------------- > > Key: KAFKA-17793 > URL: https://issues.apache.org/jira/browse/KAFKA-17793 > Project: Kafka > Issue Type: Bug > Reporter: Colin McCabe > Assignee: Colin McCabe > Priority: Major > Fix For: 4.0.0 > > > As described in KIP-500, the Kafka controller monitors the liveness of each > broker in the cluster. It gathers this information from heartbeats sent from > the brokers themselves. > In some rare cases, the main controller thread may get blocked for several > seconds at a time. In the current code, this will result in the controller > being unable to update the last contact times for the brokers during this > time. > This PR changes the controller heartbeat handling to be partially lockless. > Specifically, the last contact time for each broker will be updated > locklessly prior to the rest of the heartbeat handling. This will ensure that > heartbeats always get through. > Additionally, this PR adds a PeriodicTaskControlManager to better manage > periodic tasks. This should help handle the very common pattern where we want > to schedule a background task at some frequency. We also want the background > task to be immediately rescheduled if there is too much work to be done in > one event. -- This message was sent by Atlassian Jira (v8.20.10#820010)