Are you only interested in exceptions that result in the job failing? If so, then https://issues.apache.org/jira/browse/FLINK-20833 may be of interest to you.

On 6/18/2021 5:15 PM, Kevin Lam wrote:
Hi all,

I'm interested in instrumenting an Apache Flink application so that we can monitor exceptions. I was wondering what the best practices are here? Is there a good way to observe all the exceptions inside of a Flink application, including Flink internals?

We are currently thinking of using Bugsnag, which has some steps to integrate with java applications: https://docs.bugsnag.com/platforms/java/other/ <https://docs.bugsnag.com/platforms/java/other/>, which works fine for uncaught exceptions in the job manager / pipeline driver context, but doesn't catch anything outside of that.

We're also interested in reporting on exceptions that occur in the job execution context, eg. in task managers.

Any tips/suggestions? I'd love to learn more about exception tracking and handling in Flink :)


Reply via email to