Please check the logs for warnings. It could be that a metric registered
by a job is throwing exceptions.
On 20/04/2022 18:45, Peter Schrott wrote:
Hi kuweiha,
Just to confirm, you tried with 1.15 - none of the rcs are working for me?
This port is definitely free as it was already used on the same hosts
with Flink 1.14.4. And as I said, when no job is running on the
taskmanager it actually reports metrics on that certain port - I only
get the "empty response" when a job is running on the taskmanager I am
querying. Did you also run a job and could you access metrics like
flink_taskmanager_job_*?
The logs only tell me that everything is working fine:
2022-04-20 13:46:39,597 INFO [main]
o.a.f.r.metrics.MetricRegistryImpl:? - Reporting metrics for reporter
prom of type org.apache.flink.metrics.prometheus.PrometheusReporter.
and
2022-04-20 12:12:26,394 INFO [main] o.a.f.m.p.PrometheusReporter:? -
Started PrometheusReporter HTTP server on port 4444
Best & thanks,
Peter
On Wed, Apr 20, 2022 at 6:30 PM huweihua <huweihua....@gmail.com> wrote:
Hi, Peter
I have not been able to reproduce this problem.
From your description, it is possible that the specified port 4444
has been listened by other processes, and PrometheusReporter
failed to start.
You can confirm it from taskmanager.log, or check if port 4444 of
the host is being listened by the TaskManager process.
2022年4月20日 下午10:48,Peter Schrott <pe...@bluerootlabs.io> 写道:
Hi Flink-Users,
After upgrading to Flink 1.15 (rc3) (coming from 1.14) I noticed
that there is a problem with the metrics exposed through the
PrometheusReporter.
It is configured as followed in the flink-config.yml:
metrics.reporters: prom
metrics.reporter.prom.class:
org.apache.flink.metrics.prometheus.PrometheusReporter
metrics.reporter.prom.port: 4444
My cluster is running in standalone mode with 2 taskmanagers and
2 jobmanagers.
More specifically:
On the taskmanger that runs a job I get curl: (52) Empty reply
from server when I call curl localhost:4444. I was looking for
the metrics in the namespace flink_taskmanager_job_*, which are
only - and obviously - exposed on the taskmanager running a job.
On the other taskmanger that runs no job I get a response with a
couple of metrics of the namespace flink_taskmanager_Status- as
expected.
When configuring the JMXReporterFactory for too. I find the
desired and all other metrics via VisualVM on that
taskmanager running the job. Also in the Flink web ui, in the
"Jobs -> Overview -> Metrics" part I can select and visualize
metrics like flink_taskmanager_job_task_busyTimeMsPerSecond.
Does someone have any idea what's going on here? maybe even
confirm my findings?
Best & thanks,
Peter