jeffkbkim commented on code in PR #15835: URL: https://github.com/apache/kafka/pull/15835#discussion_r1588068337
########## group-coordinator/src/main/java/org/apache/kafka/coordinator/group/runtime/MultiThreadedEventProcessor.java: ########## @@ -126,9 +126,16 @@ private class EventProcessorThread extends Thread { private void handleEvents() { while (!shuttingDown) { - recordPollStartTime(time.milliseconds()); + // We use a single meter for aggregate idle percentage for the thread pool. + // Since meter is calculated as total_recorded_value / time_window and + // time_window is independent of the number of threads, each recorded idle + // time should be discounted by # threads. + + long idleStartTimeMs = time.milliseconds(); CoordinatorEvent event = accumulator.take(); Review Comment: Yes, I think the main issue with this implementation is that we can have the thread idle ratio go over 1.0 when the recorded idle time includes idle time from the previous window. This is exacerbated when the take() call can block forever, so the idle ratio is unbounded. If we do have a timeout on take(), the idle time overflow is limited to that value which is how it is implemented in https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/KafkaRequestHandler.scala#L114 Overall, we will get more accurate idle ratio measurements with a timeout. wdyt? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org