Re: [PR] KAFKA-16307: fix coordinator thread idle ratio [kafka]

via GitHub Thu, 02 May 2024 11:02:47 -0700


jeffkbkim commented on code in PR #15835:
URL: https://github.com/apache/kafka/pull/15835#discussion_r1588068337



##########
group-coordinator/src/main/java/org/apache/kafka/coordinator/group/runtime/MultiThreadedEventProcessor.java:
##########
@@ -126,9 +126,16 @@ private class EventProcessorThread extends Thread {
 
         private void handleEvents() {
             while (!shuttingDown) {
-                recordPollStartTime(time.milliseconds());
+                // We use a single meter for aggregate idle percentage for the 
thread pool.
+                // Since meter is calculated as total_recorded_value / 
time_window and
+                // time_window is independent of the number of threads, each 
recorded idle
+                // time should be discounted by # threads.
+
+                long idleStartTimeMs = time.milliseconds();
                 CoordinatorEvent event = accumulator.take();

Review Comment:
   Yes, I think the main issue with this implementation is that we can have the 
thread idle ratio go over 1.0 when the recorded idle time includes idle time 
from the previous window. This is exacerbated when the take() call can block 
forever, so the idle ratio is unbounded.
   
   If we do have a timeout on take(), the idle time overflow is limited to that 
value which is how it is implemented in 
https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/KafkaRequestHandler.scala#L114
   
   Overall, we will get more accurate idle ratio measurements with a timeout. 
wdyt?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] KAFKA-16307: fix coordinator thread idle ratio [kafka]

Reply via email to