Re: Prometheus metrics does not work in 1.15.0 taskmanager

Mason Chen Tue, 03 May 2022 01:01:02 -0700

Hi ChangZhou,

The warning log indicates that the metric was previously defined and so the
runtime is handling the "duplicate" metric by ignoring it. This is
typically a benign message unless you rely on this metric. Is it possible
that you are using the same task name for different tasks? It would be
defined by the `.name(...)` API in your job graph instantiation.


Can you clarify what it means that your endpoint isn't working--some
metrics missing, endpoint is timing out, etc.? Also, can you confirm from
logs that the PrometheusReporter was created properly?

Best,
Mason

On Mon, May 2, 2022 at 7:25 PM ChangZhuo Chen (陳昌倬) <czc...@czchen.org>
wrote:

> Hi,
>
> We found that taskmanager Prometheus endpoint does not work after
> upgrading from 1.14.3 to 1.15.0. Jobmanager Prometheus endpoint is okay
> in 1.15.0, so we think the problem is not in image we used. Any idea how
> to fix this problem?
>
>
> Also, we found the following log in taskmanager, but not jobmanager. Not
> sure if they are related to this issue.
>
>     2022-05-03 01:48:32,839 WARN  org.apache.flink.metrics.MetricGroup
>                      [] - Name collision: Group already contains a Metric
> with the name 'numBytesInLocal'. Metric will not be
> reported.[10.210.47.134, taskmanager, <redacted>, <redacted>, <redacted>,
> 8, Shuffle, Netty, Input]
>     2022-05-03 01:48:32,839 WARN  org.apache.flink.metrics.MetricGroup
>                      [] - Name collision: Group already contains a Metric
> with the name 'numBytesInLocalPerSecond'. Metric will not be
> reported.[10.210.47.134, taskmanager, <refacted>, <redacted>, <redacted>,
> 8, Shuffle, Netty, Input]
>     ...
>
>
> --
> ChangZhuo Chen (陳昌倬) czchen@{czchen,debian}.org
> http://czchen.info/
> Key fingerprint = BA04 346D C2E1 FE63 C790  8793 CC65 B0CD EC27 5D5B
>

Re: Prometheus metrics does not work in 1.15.0 taskmanager

Reply via email to