Scott Kidder created KAFKA-12231:
------------------------------------

             Summary: Consumer Lag increases linearly until a Consumer-Group 
Rebalance is initiated
                 Key: KAFKA-12231
                 URL: https://issues.apache.org/jira/browse/KAFKA-12231
             Project: Kafka
          Issue Type: Bug
          Components: core
    Affects Versions: 2.6.0
         Environment: Kubernetes 1.12
            Reporter: Scott Kidder
         Attachments: Consumer Lag by Partition.png, Consumer Lag on a Single 
Partition.png, Lag drop on rebalance.png, max-consumer-lag.png

I observed a linear increase in consumer lag reading from a single topic (480 
partitions) across multiple consumers for multiple hours. The increase in lag 
was stopped by initiating a consumer-group rebalance by replacing one of the 
consumers (this was in Kubernetes, so deleting a consumer pod and seeing its 
replacement pod join) at 07:46UTC on the chart below.

!max-consumer-lag.png!

 

The lag was observed across all topic partitions, but only briefly on each:

!Consumer Lag by Partition.png!

 

!Consumer Lag on a Single Partition.png!

 

For additional context, this was a Golang consumer using v1.27.2 of the Shopify 
Sarama Kafka client. Consumers used the Sticky Partition Assignor to plan 
assignments. So, even after the consumer-group rebalance, the majority of 
consumers kept their original assignments. Nothing about the data being 
consumed & processed from Kafka could explain these punctuated spikes in 
consumer lag. There were no errors or significant messages in the Kafka broker 
logs before or after the rebalance.

 

The lag dropped within 2 minutes of the consumer-group rebalance (initiated at 
07:46, lag fell at 07:48):

!Lag drop on rebalance.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to