[ https://issues.apache.org/jira/browse/FLINK-7608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16196096#comment-16196096 ]
Hai Zhou UTC+8 commented on FLINK-7608: --------------------------------------- Thanks [~rmetzger], [~StephanEwen] for the suggestion. true, i think that we need to re-discuss the choice of which histogram implementation for latency statistics. *Option #1* Currently, using histograms is *DescriptiveStatistics* from commons-math3.jar. [source | https://github.com/apache/commons-math/blob/dcaed2dc2838cfbb05d4bb783c5740668e659262/src/main/java/org/apache/commons/math4/stat/descriptive/DescriptiveStatistics.java]. DescriptiveStatistics maintains the input data in memory and has the capability of producing "rolling" statistics computed from a "window" consisting of the most recently added values. * Add a value with O(1) time complexity. * Call min、 max、 mean method once with O(n) time complexity. * Call Percentile method(use kthSelector algorithm(binary heap)) once with O(N log2 N) time complexity. *Option #2* Dropwizard's *SlidingWindowReservoirs* [source | https://github.com/dropwizard/metrics/blob/4.0-development/metrics-core/src/main/java/com/codahale/metrics/SlidingTimeWindowReservoir.java]. A histogram with a sliding window reservoir produces quantiles which are representative of the past N measurements. * Add value with O(1) time complexity. * When reporter metrics, need do a snapshot (need Arrays.sort) with O(N log2 N) time complexity. * After snapshot, Call min、 max、 mean、 percentile method once with O(1) time complexity. *My summary:* SlidingWindowReservoirs is much less time cost than the DescriptiveStatistics time in our use scene, and the implementation of the code is also simpler. If we use SlidingWindowReservoirs, we only need to make the following changes replace new DescriptiveStatistics(size) to new DropwizardHistogramWrapper(new com.codahale.metrics.Histogram(new SlidingWindowReservoir(size))) [~rmetzger], Let me know your opinions? > LatencyGauge change to histogram metric > ---------------------------------------- > > Key: FLINK-7608 > URL: https://issues.apache.org/jira/browse/FLINK-7608 > Project: Flink > Issue Type: Bug > Components: Metrics > Reporter: Hai Zhou UTC+8 > Assignee: Hai Zhou UTC+8 > Priority: Blocker > Fix For: 1.4.0, 1.3.3 > > > I used slf4jReporter[https://issues.apache.org/jira/browse/FLINK-4831] to > export metrics the log file. > I found: > {noformat} > -- Gauges > --------------------------------------------------------------------- > ...... > zhouhai-mbp.taskmanager.f3fd3a269c8c3da4e8319c8f6a201a57.Flink Streaming > Job.Map.0.latency: > value={LatencySourceDescriptor{vertexID=1, subtaskIndex=-1}={p99=116.0, > p50=59.5, min=11.0, max=116.0, p95=116.0, mean=61.833333333333336}} > zhouhai-mbp.taskmanager.f3fd3a269c8c3da4e8319c8f6a201a57.Flink Streaming > Job.Sink- Unnamed.0.latency: > value={LatencySourceDescriptor{vertexID=1, subtaskIndex=0}={p99=195.0, > p50=163.5, min=115.0, max=195.0, p95=195.0, mean=161.0}} > ...... > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)