Re: Problems with PrometheusReporter

Peter Schrott Fri, 22 Apr 2022 01:27:39 -0700

Hi Chesnay,

I had a look in my logs, there are not WARNINGS regarding metrics and
registering metrics when starting this job.


I ran the example jobs
- ./examples/table/ChangelogSocketExample.jar (table streaming)
- ./examples/streaming/StateMachineExample.jar (streaming)

When running those jobs the metrics on the taskamagers are available. I
will continue debugging my job, which uses the flink table api.

Thanks, Peter

On Thu, Apr 21, 2022 at 9:12 AM Chesnay Schepler <ches...@apache.org> wrote:

> Please check the logs for warnings. It could be that a metric registered
> by a job is throwing exceptions.
>
> On 20/04/2022 18:45, Peter Schrott wrote:
>
> Hi kuweiha,
>
> Just to confirm, you tried with 1.15 - none of the rcs are working for me?
>
> This port is definitely free as it was already used on the same hosts with
> Flink 1.14.4. And as I said, when no job is running on the taskmanager it
> actually reports metrics on that certain port - I only get the "empty
> response" when a job is running on the taskmanager I am querying. Did you
> also run a job and could you access metrics like flink_taskmanager_job_*?
>
> The logs only tell me that everything is working fine:
> 2022-04-20 13:46:39,597 INFO  [main] o.a.f.r.metrics.MetricRegistryImpl:?
> - Reporting metrics for reporter prom of type
> org.apache.flink.metrics.prometheus.PrometheusReporter.
> and
> 2022-04-20 12:12:26,394 INFO  [main] o.a.f.m.p.PrometheusReporter:? -
> Started PrometheusReporter HTTP server on port 4444
>
> Best & thanks,
> Peter
>
>
> On Wed, Apr 20, 2022 at 6:30 PM huweihua <huweihua....@gmail.com> wrote:
>
>> Hi, Peter
>> I have not been able to reproduce this problem.
>>
>> From your description, it is possible that the specified port 4444 has
>> been listened by other processes, and PrometheusReporter failed to start.
>> You can confirm it from taskmanager.log, or check if port 4444 of the
>> host is being listened by the TaskManager process.
>>
>>
>> 2022年4月20日 下午10:48，Peter Schrott <pe...@bluerootlabs.io> 写道：
>>
>> Hi Flink-Users,
>>
>> After upgrading to Flink 1.15 (rc3) (coming from 1.14) I noticed that
>> there is a problem with the metrics exposed through the
>> PrometheusReporter.
>>
>> It is configured as followed in the flink-config.yml:
>> metrics.reporters: prom
>> metrics.reporter.prom.class:
>> org.apache.flink.metrics.prometheus.PrometheusReporter
>> metrics.reporter.prom.port: 4444
>>
>> My cluster is running in standalone mode with 2 taskmanagers and 2
>> jobmanagers.
>>
>> More specifically:
>>
>> On the taskmanger that runs a job I get curl: (52) Empty reply from
>> server when I call curl localhost:4444. I was looking for the metrics in
>> the namespace flink_taskmanager_job_*, which are only - and obviously -
>> exposed on the taskmanager running a job.
>>
>> On the other taskmanger that runs no job I get a response with a couple
>> of metrics of the namespace flink_taskmanager_Status - as expected.
>>
>> When configuring the JMXReporterFactory for too. I find the desired and
>> all other metrics via VisualVM on that taskmanager running the job. Also in
>> the Flink web ui, in the "Jobs -> Overview -> Metrics" part I can select
>> and visualize metrics like flink_taskmanager_job_task_busyTimeMsPerSecond
>> .
>>
>> Does someone have any idea what's going on here? maybe even confirm my
>> findings?
>>
>> Best & thanks,
>> Peter
>>
>>
>>
>

Re: Problems with PrometheusReporter

Reply via email to