Re: Prometheus pushgateway 监控 Flink metrics的问题

李佳宸 Mon, 11 May 2020 20:29:01 -0700

十分感谢～～～但我确实RandomJobNameSuffix为true时没有问题，很奇怪。
另外，我使用prometheus reporter发现比pushgateway少了特别多的metrics，不知道您有这种情况吗？


972684638 <[email protected]> 于2020年5月12日周二 上午10:22写道：

> 我不清楚这算不算BUG，但是你说的问题，我确实遇到过，并经历了一段时间的排查，最终得以解决。
>
> 这跟metrics.reporter.promgateway.randomJobNameSuffix没有关系，建议你详细阅读一下pushgateway的官方文档，搞清楚推送方式GET和POST的区别。
>
> 然后去flink-metrics-prometheus包下面找到org.apache.flink.metrics.prometheus.PrometheusPushGatewayReporter#report这个方法，将它推送方式修改一下，重新打包，就可以了。很高兴能帮到你。
> 详细排查过程，参考我的文章：
> https://daijiguo.blog.csdn.net/article/details/105453643
>
>
>
>
>
>
> ------------------&nbsp;原始邮件&nbsp;------------------
> 发件人:&nbsp;"李佳宸"<[email protected]&gt;;
> 发送时间:&nbsp;2020年5月12日(星期二) 上午8:57
> 收件人:&nbsp;"user-zh"<[email protected]&gt;;
>
> 主题:&nbsp;Prometheus pushgateway 监控 Flink metrics的问题
>
>
>
> 您好！
>
> 我在使用prometheus监控flink时发现一个问题不知是不是bug,反映如下
>
> 版本信息
> Flink 1.9.1
> Prometheus 2.18
> pushgateway 1.2.0
>
> 问题：
> 配置
>
> metrics.reporter.promgateway.randomJobNameSuffix为false后，部分metrics不能正确的push到pushgateway里。具体表现是，部分metrics（主要是jobmanager相关，如
> flink_jobmanager_Status_JVM_CPU_Load
> ），无法持久的存在pushgateway中，频繁刷新发现指标一会儿消失，一会儿又出现。还有部分指标直接丢失了，如
> flink_jobmanager_job_fullRestarts。
>
> metrics.reporter.promgateway.randomJobNameSuffix设置为true时，功能是正常的。
>
> 以下是我的相关配置：
> metrics.reporter.promgateway.class:
> org.apache.flink.metrics.prometheus.PrometheusPushGatewayReporter
> metrics.reporter.promgateway.host: localhost
> metrics.reporter.promgateway.port: 9091
> metrics.reporter.promgateway.jobName: cluster1
> metrics.reporter.promgateway.randomJobNameSuffix: *false*
> metrics.reporter.promgateway.deleteOnShutdown: *false*
>
> 望能解决我的疑惑，谢谢～～～～

Re: Prometheus pushgateway 监控 Flink metrics的问题

回复