Sam Lendle created KAFKA-7240: --------------------------------- Summary: -total metrics in Streams are incorrect Key: KAFKA-7240 URL: https://issues.apache.org/jira/browse/KAFKA-7240 Project: Kafka Issue Type: Bug Components: metrics, streams Affects Versions: 2.0.0 Reporter: Sam Lendle
I noticed the values of total metrics for streams were decreasing periodically when viewed in JMX, for example process-total for each processor-node-id under stream-processor-node-metrics. Looking at StreamsMetricsThreadImpl, I believe this behavior is due to using Count() as the Stat for the *-total metrics. Count() is a SampledStat, so the value it reports is the count in recent time windows, and the value decreases whenever a window is purged. ---- This explains the behavior I saw, but I think the issue is deeper. For example, processTimeSensor attempts to measure, process-latency-avg, process-latency-max, process-rate, and process-total. For that sensor, record is called like streamsMetrics.processTimeSensor.record(computeLatency() / (double) processed, timerStartedMs); so the value passed to record is average latency per processed message in this batch if I understand correctly. That gets pushed through to the call to Count#record, which increments it's count by 1, ignoring the value parameter. Whatever stat is recording the total would need to know is the number of messages processed. Because of that, I don't think it's possible for one Sensor to measure both latency and total. That said, it's not clear to me how all the different Stats work and how exactly Sensors work, and I don't actually understand how the process-rate metric is working for similar reasons but that seems to be correct, so I may be missing something here. cc [~guozhang] -- This message was sent by Atlassian JIRA (v7.6.3#76005)