Hi,

we are using datadog as our metrics reporter as documented here:
https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/metric_reporters/#datadog

our jobmanager scope is
    metrics.scope.jm: flink.jobmanager
    metrics.scope.jm.job: flink.jobmanager
since datadog doesn't allow placeholder in metric names, we cannot include
the <host> or <job_name> placeholder in the scope.

This setup worked nicely on our standalone kubernetes application
deployment without using HA.
But when we set up HA, we lost checkpointing metrics in datadog, and see
this warning in the jobmanager log:

2021-10-01 04:22:09,920 WARN  org.apache.flink.metrics.MetricGroup
                    [] - Name collision: Group already contains a
Metric with the name 'totalNumberOfCheckpoints'. Metric will not be
reported.[flink, jobmanager]
2021-10-01 04:22:09,920 WARN  org.apache.flink.metrics.MetricGroup
                    [] - Name collision: Group already contains a
Metric with the name 'numberOfInProgressCheckpoints'. Metric will not
be reported.[flink, jobmanager]
2021-10-01 04:22:09,920 WARN  org.apache.flink.metrics.MetricGroup
                    [] - Name collision: Group already contains a
Metric with the name 'numberOfCompletedCheckpoints'. Metric will not
be reported.[flink, jobmanager]
2021-10-01 04:22:09,921 WARN  org.apache.flink.metrics.MetricGroup
                    [] - Name collision: Group already contains a
Metric with the name 'numberOfFailedCheckpoints'. Metric will not be
reported.[flink, jobmanager]
2021-10-01 04:22:09,921 WARN  org.apache.flink.metrics.MetricGroup
                    [] - Name collision: Group already contains a
Metric with the name 'lastCheckpointRestoreTimestamp'. Metric will not
be reported.[flink, jobmanager]
2021-10-01 04:22:09,921 WARN  org.apache.flink.metrics.MetricGroup
                    [] - Name collision: Group already contains a
Metric with the name 'lastCheckpointSize'. Metric will not be
reported.[flink, jobmanager]
2021-10-01 04:22:09,921 WARN  org.apache.flink.metrics.MetricGroup
                    [] - Name collision: Group already contains a
Metric with the name 'lastCheckpointDuration'. Metric will not be
reported.[flink, jobmanager]
2021-10-01 04:22:09,921 WARN  org.apache.flink.metrics.MetricGroup
                    [] - Name collision: Group already contains a
Metric with the name 'lastCheckpointProcessedData'. Metric will not be
reported.[flink, jobmanager]
2021-10-01 04:22:09,921 WARN  org.apache.flink.metrics.MetricGroup
                    [] - Name collision: Group already contains a
Metric with the name 'lastCheckpointPersistedData'. Metric will not be
reported.[flink, jobmanager]
2021-10-01 04:22:09,921 WARN  org.apache.flink.metrics.MetricGroup
                    [] - Name collision: Group already contains a
Metric with the name 'lastCheckpointExternalPath'. Metric will not be
reported.[flink, jobmanager]


I assume this is because we now have two jobmanager pods (one active one
standby) and they both report this metric, it fails. but we cannot use the
<host> scope in the group, otherwise we won't be able to build datadog
dashboards conveniently.

My question:
- did anyone else encounter this problem?
- how could we solve this to have checkpointing metrics again in HA mode
without needing the <host> placeholder?

Thanks a lot
Clemens

-- 


By communicating with Grab Inc and/or its subsidiaries, associate 
companies and jointly controlled entities (“Grab Group”), you are deemed to 
have consented to the processing of your personal data as set out in the 
Privacy Notice which can be viewed at https://grab.com/privacy/ 
<https://grab.com/privacy/>


This email contains confidential information 
and is only for the intended recipient(s). If you are not the intended 
recipient(s), please do not disseminate, distribute or copy this email 
Please notify Grab Group immediately if you have received this by mistake 
and delete this email from your system. Email transmission cannot be 
guaranteed to be secure or error-free as any information therein could be 
intercepted, corrupted, lost, destroyed, delayed or incomplete, or contain 
viruses. Grab Group do not accept liability for any errors or omissions in 
the contents of this email arises as a result of email transmission. All 
intellectual property rights in this email and attachments therein shall 
remain vested in Grab Group, unless otherwise provided by law.

Reply via email to