[ 
https://issues.apache.org/jira/browse/KUDU-3566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17840542#comment-17840542
 ] 

ASF subversion and git services commented on KUDU-3566:
-------------------------------------------------------

Commit b236d534abeb60520e4568bb4a1452d6674bb597 in kudu's branch 
refs/heads/master from Alexey Serbin
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=b236d534a ]

KUDU-3566 fix summary metrics in Prometheus format

This patch corrects the output of various Kudu metrics backed by HDR
histograms.  From the Prometheus perspective, those metrics are output
as summaries [1], not histograms [2].  It's necessary to mark them
accordingly to avoid misinterpretation of the collected statistics.

I updated corresponding unit tests and verified that the updated output
was properly parsed and interpreted by a Prometheus 2.50.0 instance
running on my macOS laptop.

[1] https://prometheus.io/docs/concepts/metric_types/#summary
[2] https://prometheus.io/docs/concepts/metric_types/#histogram

Change-Id: I1375ddf1b0ecd730327cd44b4955813b80107f7b
Reviewed-on: http://gerrit.cloudera.org:8080/21338
Tested-by: Alexey Serbin <ale...@apache.org>
Reviewed-by: Abhishek Chennaka <achenn...@cloudera.com>


> Incorrect semantics for Prometheus-style histogram metrics
> ----------------------------------------------------------
>
>                 Key: KUDU-3566
>                 URL: https://issues.apache.org/jira/browse/KUDU-3566
>             Project: Kudu
>          Issue Type: Bug
>          Components: master, tserver
>    Affects Versions: 1.17.0
>            Reporter: Alexey Serbin
>            Priority: Major
>              Labels: metrics, observability
>
> Original KUDU-3375 implementation incorrectly exposes [summary-type 
> Prometheus metrics|https://prometheus.io/docs/concepts/metric_types/#summary] 
> as [histogram-type 
> ones|https://prometheus.io/docs/concepts/metric_types/#histogram] for data 
> collected by corresponding HDR histograms.  For example, below are snippets 
> from {{/metric}} and {{/metrics_prometheus}} for statistics on ListMasters 
> RPC.
> The data exposed as Prometheus-style histogram metrics should have been 
> reported as summary metrics instead.
> JSON-style:
> {noformat}
> {   
>     "name": "handler_latency_kudu_master_MasterService_ListMasters",          
>       "total_count": 26,
>     "min": 152,
>     "mean": 301.2692307692308,
>     "percentile_75": 324,
>     "percentile_95": 468,
>     "percentile_99": 844,
>     "percentile_99_9": 844,
>     "percentile_99_99": 844,
>     "max": 844,
>     "total_sum": 7833
> }
> {noformat}
> Prometheus-style counterpart:
> {noformat}
> # HELP kudu_master_handler_latency_kudu_master_MasterService_ListMasters 
> Microseconds spent handling kudu.master.MasterService.ListMasters RPC requests
> # TYPE kudu_master_handler_latency_kudu_master_MasterService_ListMasters 
> histogram
> kudu_master_handler_latency_kudu_master_MasterService_ListMasters_bucket{unit_type="microseconds",
>  le="0.75"} 324
> kudu_master_handler_latency_kudu_master_MasterService_ListMasters_bucket{unit_type="microseconds",
>  le="0.95"} 468
> kudu_master_handler_latency_kudu_master_MasterService_ListMasters_bucket{unit_type="microseconds",
>  le="0.99"} 844
> kudu_master_handler_latency_kudu_master_MasterService_ListMasters_bucket{unit_type="microseconds",
>  le="0.999"} 844
> kudu_master_handler_latency_kudu_master_MasterService_ListMasters_bucket{unit_type="microseconds",
>  le="0.9999"} 844
> kudu_master_handler_latency_kudu_master_MasterService_ListMasters_bucket{unit_type="microseconds",
>  le="+Inf"} 26
> kudu_master_handler_latency_kudu_master_MasterService_ListMasters_sum{unit_type="microseconds"}
>  7833
> kudu_master_handler_latency_kudu_master_MasterService_ListMasters_count{unit_type="microseconds"}
>  26
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to