The thing about numRunningJobs metric is that i have to configure in
advance the Prometheus rules with the number of jobs i expect to be running
in order to alert, i kind of need this rule to alert on individual jobs. I
initially thought of flink_jobmanager_downtime{job_id=~".*"} == -1 , bit it
resulted that the metric just emits 0 on running jobs, and doesn't emit -1
for failed jobs.

El lun., 16 dic. 2019 7:01 p. m., PoolakkalMukkath, Shakir <> escribió:

> You could use “flink_jobmanager_numRunningJobs” to check the number of
> running jobs.
> Thanks
> *From: *Jesús Vásquez <>
> *Date: *Monday, December 16, 2019 at 12:47 PM
> *To: *"" <>
> *Subject: *[EXTERNAL] Flink and Prometheus monitoring question
> Hi,
> I want to monitor Flink Streaming jobs using Prometheus
> My first goal is to send alerts when a Flink job has failed.
> The thing is that looking at the documentation I haven't found a metric
> that helps me defining an alerting rule.
> As a starting point i thought that the metric
> flink_jobmanager_job_downtime could help since the doc says this metric
> emits -1 for a completed job.
> But when i tested this i found out this doesn't work since the metric
> always emits 0 and after the job is completed there is no metric.
> Has anyone managed to alert when flink job has failed with Prometheus?
> Thanks for your help.

Reply via email to