[ https://issues.apache.org/jira/browse/FLINK-4840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15954938#comment-15954938 ]
Chesnay Schepler commented on FLINK-4840: ----------------------------------------- I may have found a suitable implementation alternative: The key problem in the existing approach is that it calculates the time taken for every invocation of the method, which is just to expensive since this requires 2 time measurements (which should also use nanoTime which is even more expensive), as well as using a histogram. My idea would be to * no longer create a histogram since this can be done easily outside of Flink and only provide raw time measurements * not measure the time for every call, but instead only a fixed number of times over a period of time. We already have all tools that we require for this, the View interface. We can generalize the details in a new Timer interface: {code} public interface Timer implements Metric { void start(); void end(); long getTime(); // last measure time } {code} The following TimerView implementation relies on the View interface to be regularly (every 5 seconds) enabled using the update() method. If the TimerView is not enabled start() and stop() are no-ops. If it is enabled it will take a single measurement. The implementation could look like this: {code} public class TimerView implements Timer, View { private boolean enabled = false; private long startTime = 0; private long lastMeasurement = -1; public void update() { enabled = true; } public void start() { if (enabled) { startTime = System.nanoTime(); } } public void stop() { if (enabled) { lastMeasurement = System.nanoTime() - startTime; // convert to millis or smth enabled = false; } } public long getTime() { return lastMeasurement; } } {code} I quickly threw this together so here are of course some details missing, like what happens when stop() is never called and such. But the general approach seems reasonable to me; tell me what you think. > Measure latency of record processing and expose it as a metric > -------------------------------------------------------------- > > Key: FLINK-4840 > URL: https://issues.apache.org/jira/browse/FLINK-4840 > Project: Flink > Issue Type: Improvement > Components: Metrics > Reporter: zhuhaifeng > Assignee: zhuhaifeng > Priority: Minor > > We should expose the following Metrics on the TaskIOMetricGroup: > 1. recordProcessLatency(ms): Histogram measuring the processing time per > record of a task. It is the processing time of chain if a chained task. > 2. recordProcTimeProportion(ms): Meter measuring the proportion of record > processing time for infor whether the main cost -- This message was sent by Atlassian JIRA (v6.3.15#6346)