Colin McCabe created KAFKA-17793:
------------------------------------

             Summary: Improve kcontroller robustness against long delays
                 Key: KAFKA-17793
                 URL: https://issues.apache.org/jira/browse/KAFKA-17793
             Project: Kafka
          Issue Type: Bug
            Reporter: Colin McCabe
            Assignee: Colin McCabe


As described in KIP-500, the Kafka controller monitors the liveness of each 
broker in the cluster. It gathers this information from heartbeats sent from 
the brokers themselves.

In some rare cases, the main controller thread may get blocked for several 
seconds at a time. In the current code, this will result in the controller 
being unable to update the last contact times for the brokers during this time.

This PR changes the controller heartbeat handling to be partially lockless. 
Specifically, the last contact time for each broker will be updated locklessly 
prior to the rest of the heartbeat handling. This will ensure that heartbeats 
always get through.

Additionally, this PR adds a PeriodicTaskControlManager to better manage 
periodic tasks. This should help handle the very common pattern where we want 
to schedule a background task at some frequency. We also want the background 
task to be immediately rescheduled if there is too much work to be done in one 
event.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to