Hi Chesnay, I had a look in my logs, there are not WARNINGS regarding metrics and registering metrics when starting this job.
I ran the example jobs - ./examples/table/ChangelogSocketExample.jar (table streaming) - ./examples/streaming/StateMachineExample.jar (streaming) When running those jobs the metrics on the taskamagers are available. I will continue debugging my job, which uses the flink table api. Thanks, Peter On Thu, Apr 21, 2022 at 9:12 AM Chesnay Schepler <ches...@apache.org> wrote: > Please check the logs for warnings. It could be that a metric registered > by a job is throwing exceptions. > > On 20/04/2022 18:45, Peter Schrott wrote: > > Hi kuweiha, > > Just to confirm, you tried with 1.15 - none of the rcs are working for me? > > This port is definitely free as it was already used on the same hosts with > Flink 1.14.4. And as I said, when no job is running on the taskmanager it > actually reports metrics on that certain port - I only get the "empty > response" when a job is running on the taskmanager I am querying. Did you > also run a job and could you access metrics like flink_taskmanager_job_*? > > The logs only tell me that everything is working fine: > 2022-04-20 13:46:39,597 INFO [main] o.a.f.r.metrics.MetricRegistryImpl:? > - Reporting metrics for reporter prom of type > org.apache.flink.metrics.prometheus.PrometheusReporter. > and > 2022-04-20 12:12:26,394 INFO [main] o.a.f.m.p.PrometheusReporter:? - > Started PrometheusReporter HTTP server on port 4444 > > Best & thanks, > Peter > > > On Wed, Apr 20, 2022 at 6:30 PM huweihua <huweihua....@gmail.com> wrote: > >> Hi, Peter >> I have not been able to reproduce this problem. >> >> From your description, it is possible that the specified port 4444 has >> been listened by other processes, and PrometheusReporter failed to start. >> You can confirm it from taskmanager.log, or check if port 4444 of the >> host is being listened by the TaskManager process. >> >> >> 2022年4月20日 下午10:48,Peter Schrott <pe...@bluerootlabs.io> 写道: >> >> Hi Flink-Users, >> >> After upgrading to Flink 1.15 (rc3) (coming from 1.14) I noticed that >> there is a problem with the metrics exposed through the >> PrometheusReporter. >> >> It is configured as followed in the flink-config.yml: >> metrics.reporters: prom >> metrics.reporter.prom.class: >> org.apache.flink.metrics.prometheus.PrometheusReporter >> metrics.reporter.prom.port: 4444 >> >> My cluster is running in standalone mode with 2 taskmanagers and 2 >> jobmanagers. >> >> More specifically: >> >> On the taskmanger that runs a job I get curl: (52) Empty reply from >> server when I call curl localhost:4444. I was looking for the metrics in >> the namespace flink_taskmanager_job_*, which are only - and obviously - >> exposed on the taskmanager running a job. >> >> On the other taskmanger that runs no job I get a response with a couple >> of metrics of the namespace flink_taskmanager_Status - as expected. >> >> When configuring the JMXReporterFactory for too. I find the desired and >> all other metrics via VisualVM on that taskmanager running the job. Also in >> the Flink web ui, in the "Jobs -> Overview -> Metrics" part I can select >> and visualize metrics like flink_taskmanager_job_task_busyTimeMsPerSecond >> . >> >> Does someone have any idea what's going on here? maybe even confirm my >> findings? >> >> Best & thanks, >> Peter >> >> >> >