Flink Metric Naming And Reporting Confusion

Saad Mufti Fri, 17 Feb 2023 06:24:46 -0800

Hi,

My team just started coding a new Flink app to be deployed under AWS EMR.
We are a little confused by the metric naming for built in metrics (not
ones we create ourselves) and reporting of metrics via StatsD.


We have added configuration to flink-conf.yaml to configure a StatsD
reporter. This is reporting a proprietary back end of our own. What we have
observed is we're getting metrics of the form:

ip-10-76-10-112_ec2_internal.taskmanager.container_1676482557753_0002_01_000006.Status.JVM.CPU.Load

Going through the documentation (
https://nightlies.apache.org/flink/flink-docs-master/docs/ops/metrics/#system-scope)
that IP address and task id are added by the scope for built in system
metrics. But we can't understand how this is useful, to give a totally
different metric name to every task's metric? Why doesn't it instead report
all these with the name "Status.JVM.CPU.Load" and add the IP address and
task id as tags? What's the design motivation for changing every metric
name as opposed to tags?

Also we're getting a few of the metrics listed at
https://nightlies.apache.org/flink/flink-docs-master/docs/ops/metrics/#system-metrics
in our back end, but none of the application/operator level metrics that
are supposedly also maintained by the framework. We CAN see those directly
in the Flink dashboard. So why aren't those also being reported via the
StatsD reporter?

Any help or insight would be most appreciated.

Thanks.

----
Saad

Flink Metric Naming And Reporting Confusion

Reply via email to