[ 
https://issues.apache.org/jira/browse/FLINK-7608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16196096#comment-16196096
 ] 

Hai Zhou UTC+8 commented on FLINK-7608:
---------------------------------------

Thanks [~rmetzger], [~StephanEwen] for the suggestion.  
true, i think that we need to re-discuss the choice of which histogram 
implementation for latency statistics.   

*Option #1*
Currently, using histograms is *DescriptiveStatistics* from commons-math3.jar.  
[source | 
https://github.com/apache/commons-math/blob/dcaed2dc2838cfbb05d4bb783c5740668e659262/src/main/java/org/apache/commons/math4/stat/descriptive/DescriptiveStatistics.java].

DescriptiveStatistics maintains the input data in memory and has the capability 
of producing "rolling" statistics computed from a "window" consisting of the 
most recently added values.

*  Add a value with O(1) time complexity.

*  Call min、 max、 mean method once with O(n) time complexity.

*  Call Percentile method(use  kthSelector algorithm(binary heap)) once with 
O(N log2 N) time complexity.

*Option #2*
Dropwizard's *SlidingWindowReservoirs* [source | 
https://github.com/dropwizard/metrics/blob/4.0-development/metrics-core/src/main/java/com/codahale/metrics/SlidingTimeWindowReservoir.java].

A histogram with a sliding window reservoir produces quantiles which are 
representative of the past N measurements.

* Add value with O(1) time complexity.

* When reporter metrics, need do a snapshot (need Arrays.sort) with O(N log2 N) 
time complexity.

* After snapshot, Call min、 max、 mean、 percentile method once with O(1) time 
complexity.

*My summary:*

SlidingWindowReservoirs is much less time cost than the DescriptiveStatistics 
time in our use scene, and the implementation of the code is also simpler.

If we use SlidingWindowReservoirs, we only need to make the following changes
replace new DescriptiveStatistics(size)  to 
new DropwizardHistogramWrapper(new com.codahale.metrics.Histogram(new 
SlidingWindowReservoir(size))) 

[~rmetzger], Let me know your opinions?

> LatencyGauge change to  histogram metric
> ----------------------------------------
>
>                 Key: FLINK-7608
>                 URL: https://issues.apache.org/jira/browse/FLINK-7608
>             Project: Flink
>          Issue Type: Bug
>          Components: Metrics
>            Reporter: Hai Zhou UTC+8
>            Assignee: Hai Zhou UTC+8
>            Priority: Blocker
>             Fix For: 1.4.0, 1.3.3
>
>
> I used slf4jReporter[https://issues.apache.org/jira/browse/FLINK-4831]  to 
> export metrics the log file.
> I found:
> {noformat}
> -- Gauges 
> ---------------------------------------------------------------------
> ......
> zhouhai-mbp.taskmanager.f3fd3a269c8c3da4e8319c8f6a201a57.Flink Streaming 
> Job.Map.0.latency:
>  value={LatencySourceDescriptor{vertexID=1, subtaskIndex=-1}={p99=116.0, 
> p50=59.5, min=11.0, max=116.0, p95=116.0, mean=61.833333333333336}}
> zhouhai-mbp.taskmanager.f3fd3a269c8c3da4e8319c8f6a201a57.Flink Streaming 
> Job.Sink- Unnamed.0.latency: 
> value={LatencySourceDescriptor{vertexID=1, subtaskIndex=0}={p99=195.0, 
> p50=163.5, min=115.0, max=195.0, p95=195.0, mean=161.0}}
> ......
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to