Sam Lendle created KAFKA-7240:
---------------------------------

             Summary: -total metrics in Streams are incorrect
                 Key: KAFKA-7240
                 URL: https://issues.apache.org/jira/browse/KAFKA-7240
             Project: Kafka
          Issue Type: Bug
          Components: metrics, streams
    Affects Versions: 2.0.0
            Reporter: Sam Lendle


I noticed the values of total metrics for streams were decreasing periodically 
when viewed in JMX, for example process-total for each processor-node-id under 
stream-processor-node-metrics. 

 Looking at StreamsMetricsThreadImpl, I believe this behavior is due to using 
Count() as the Stat for the *-total metrics. Count() is a SampledStat, so the 
value it reports is the count in recent time windows, and the value decreases 
whenever a window is purged.

----

This explains the behavior I saw, but I think the issue is deeper. For example, 
processTimeSensor attempts to measure, process-latency-avg, 
process-latency-max, process-rate, and process-total. For that sensor, record 
is called like
streamsMetrics.processTimeSensor.record(computeLatency() / (double) processed, 
timerStartedMs);
so the value passed to record is average latency per processed message in this 
batch if I understand correctly. That gets pushed through to the call to 
Count#record, which increments it's count by 1, ignoring the value parameter. 
Whatever stat is recording the total would need to know is the number of 
messages processed. Because of that, I don't think it's possible for one Sensor 
to measure both latency and total.

That said, it's not clear to me how all the different Stats work and how 
exactly Sensors work, and I don't actually understand how the process-rate 
metric is working for similar reasons but that seems to be correct, so I may be 
missing something here. 
 

cc [~guozhang]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to