Hi. I think you can use REST OPEN API to fetch the job status from the JM periodically to detect whether something happens. Currently REST OPEN API also supports to fetch the exception list for the specified job[2].
Best, Shengkai [1] https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs [2] https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-jobid-exceptions unknown unknown <unimon...@gmail.com> 于2022年5月26日周四 23:06写道: > Hello Users! > > I would like to notify an external endpoint when a streaming job has a > certain number of restarts. While I can use a service to continuously > *poll* Flink metrics and identify failing jobs, I am looking to > inverse the action and have the job notify. We have around ~50 streaming > jobs and it gets challenging querying on a continuous basis. > > Looking into [1], the intrusive way was to perform the action at [2] > (not tested though) Happy to hear suggestions and alternatives ? > > > [1] > https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/dev/execution/task_failure_recovery/#restart-strategies > > > [2] > https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/failover/flip1/FixedDelayRestartBackoffTimeStrategy.java#L68 > > > Thanks > AK. >