Thanks Shengkai! Unfortunately, this would require querying status for each job continuously. Given very few pipelines experience failures and they are far in-between, I am looking for a push based model vs polling.
Thanks AK On Thu, May 26, 2022 at 7:21 PM Shengkai Fang <fskm...@gmail.com> wrote: > Hi. > > I think you can use REST OPEN API to fetch the job status from the > JM periodically to detect whether something happens. Currently REST OPEN > API also supports to fetch the exception list for the specified job[2]. > > Best, > Shengkai > > [1] > https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs > [2] > https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-jobid-exceptions > > unknown unknown <unimon...@gmail.com> 于2022年5月26日周四 23:06写道: > >> Hello Users! >> >> I would like to notify an external endpoint when a streaming job has >> a certain number of restarts. While I can use a service to continuously >> *poll* Flink metrics and identify failing jobs, I am looking to >> inverse the action and have the job notify. We have around ~50 streaming >> jobs and it gets challenging querying on a continuous basis. >> >> Looking into [1], the intrusive way was to perform the action at [2] >> (not tested though) Happy to hear suggestions and alternatives ? >> >> >> [1] >> https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/dev/execution/task_failure_recovery/#restart-strategies >> >> >> [2] >> https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/failover/flip1/FixedDelayRestartBackoffTimeStrategy.java#L68 >> >> >> Thanks >> AK. >> >