Hi all, i'm monitoring Flink jobs using prometheus.
I have been trying to use the metrics flink_jobmanager_job_uptime/downtime
in order to create an alert, that fires when one of this values emits -1
since the doc says this is the behavior of the metric when the job gets to
a completed state.
The thing is that i have tested the behavior when one of my job fails and
the mentioned metrics never emit something different than zero. Finally the
metric disappears after the job has failed.
Am i missing something or is this the expected behavior ?