[ 
https://issues.apache.org/jira/browse/FLINK-4840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15954938#comment-15954938
 ] 

Chesnay Schepler commented on FLINK-4840:
-----------------------------------------

I may have found a suitable implementation alternative:

The key problem in the existing approach is that it calculates the time taken 
for every invocation of the method, which is just to expensive since this 
requires 2 time measurements (which should also use nanoTime which is even more 
expensive), as well as using a histogram.

My idea would be to
* no longer create a histogram since this can be done easily outside of Flink 
and only provide raw time measurements
* not measure the time for every call, but instead only a fixed number of times 
over a period of time. We already have all tools that we require for this, the 
View interface.

We can generalize the details in a new Timer interface:
{code}
public interface Timer implements Metric {
        void start();
        void end();
        long getTime(); // last measure time
}
{code}

The following TimerView implementation relies on the View interface to be 
regularly (every 5 seconds) enabled using the update() method.
If the TimerView is not enabled start() and stop() are no-ops. If it is enabled 
it will take a single measurement.

The implementation could look like this:
{code}
public class TimerView implements Timer, View {
        private boolean enabled = false;
        private long startTime = 0;
        private long lastMeasurement = -1;

        public void update() {
                enabled = true;
        }

        public void start() {
                if (enabled) {
                        startTime = System.nanoTime();
                }
        }

        public void stop() {
                if (enabled) {
                        lastMeasurement = System.nanoTime() - startTime; // 
convert to millis or smth
                        enabled = false;
                }
        }

        public long getTime() {
                return lastMeasurement;
        }
}
{code}

I quickly threw this together so here are of course some details missing, like 
what happens when stop() is never called and such.

But the general approach seems reasonable to me; tell me what you think.

> Measure latency of record processing and expose it as a metric
> --------------------------------------------------------------
>
>                 Key: FLINK-4840
>                 URL: https://issues.apache.org/jira/browse/FLINK-4840
>             Project: Flink
>          Issue Type: Improvement
>          Components: Metrics
>            Reporter: zhuhaifeng
>            Assignee: zhuhaifeng
>            Priority: Minor
>
> We should expose the following Metrics on the TaskIOMetricGroup:
> 1. recordProcessLatency(ms): Histogram measuring the processing time per 
> record of a task. It is the processing time of chain if a chained task.  
> 2. recordProcTimeProportion(ms): Meter measuring the proportion of record 
> processing time for infor whether the main cost



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to