Chris Egerton created KAFKA-12476:
-------------------------------------

             Summary: Worker can block for longer than scheduled rebalance 
delay and/or session key TTL
                 Key: KAFKA-12476
                 URL: https://issues.apache.org/jira/browse/KAFKA-12476
             Project: Kafka
          Issue Type: Bug
          Components: KafkaConnect
    Affects Versions: 3.0.0, 2.3.2, 2.4.2, 2.5.2, 2.8.0, 2.7.1, 2.6.2
            Reporter: Chris Egerton
            Assignee: Chris Egerton


Near the end of a distributed worker's herder tick loop, it calculates how long 
it should poll for rebalance activity before beginning a new loop. See 
[here|https://github.com/apache/kafka/blob/8da65936d7fc53d24c665c0d01893d25a430933b/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L399-L409]
 and 
[here|https://github.com/apache/kafka/blob/8da65936d7fc53d24c665c0d01893d25a430933b/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L459].

In between then and when it begins polling for rebalancing activity, some 
connector and task (re-)starts take place. While this normally completes in at 
most a minute or two, an overloaded cluster or one in the midst of garbage 
collection may take longer. See 
[here|https://github.com/apache/kafka/blob/8da65936d7fc53d24c665c0d01893d25a430933b/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L411-L452].

The worker should calculate the time to poll for rebalance activity as closely 
as possible to when it actually begins that polling.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to