Re: [EXTERNAL] Flink and Prometheus monitoring question

2019-12-16 Thread Zhu Zhu
Hi Jesús, If your job has checkpointing enabled, you can monitor 'numberOfCompletedCheckpoints' to see wether the job is still alive and healthy. Thanks, Zhu Zhu Jesús Vásquez 于2019年12月17日周二 上午2:43写道: > The thing about numRunningJobs metric is that i have to configure in > advance the Prometheu

Re: [EXTERNAL] Flink and Prometheus monitoring question

2019-12-16 Thread Jesús Vásquez
The thing about numRunningJobs metric is that i have to configure in advance the Prometheus rules with the number of jobs i expect to be running in order to alert, i kind of need this rule to alert on individual jobs. I initially thought of flink_jobmanager_downtime{job_id=~".*"} == -1 , bit it res

Re: [EXTERNAL] Flink and Prometheus monitoring question

2019-12-16 Thread PoolakkalMukkath, Shakir
You could use “flink_jobmanager_numRunningJobs” to check the number of running jobs. Thanks From: Jesús Vásquez Date: Monday, December 16, 2019 at 12:47 PM To: "user@flink.apache.org" Subject: [EXTERNAL] Flink and Prometheus monitoring question Hi, I want to monitor Flink Streaming jobs using