[ 
https://issues.apache.org/jira/browse/KAFKA-6263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16880501#comment-16880501
 ] 

ASF GitHub Bot commented on KAFKA-6263:
---------------------------------------

anatasiavela commented on pull request #7045: KAFKA-6263: Expose metrics for 
group and transaction metadata loading duration
URL: https://github.com/apache/kafka/pull/7045
 
 
   [JIRA](https://issues.apache.org/jira/browse/KAFKA-6263)
   
   - Add metrics to provide visibility for how long group metadata and 
transaction metadata take to load in order to understand some inactivity seen 
in the consumer groups
   - Tests include mocking load times by creating a delay after each are loaded 
and ensuring the measured JMX metric is as it should be
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Expose metric for group metadata loading duration
> -------------------------------------------------
>
>                 Key: KAFKA-6263
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6263
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Jason Gustafson
>            Assignee: Anastasia Vela
>            Priority: Major
>              Labels: needs-kip
>
> We have seen in several cases where the log cleaner either wasn't enabled or 
> had experienced some failure that __consumer_offsets partitions can grow 
> excessively. When one of these partitions changes leadership, the new 
> coordinator must load the offset cache from the start of the log, which can 
> take arbitrarily long depending on how large the partition has grown (we have 
> seen cases where it took hours). Catching this problem is not always easy 
> because the condition is rare and the symptom just tends to be a long period 
> of inactivity in the consumer group which gradually gets worse over time. It 
> may therefore be useful to have a broker metric for the load time so that it 
> can be monitored and potentially alerted on. Same thing goes for the 
> transaction log 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to