Are you only interested in exceptions that result in the job failing? If
so, then https://issues.apache.org/jira/browse/FLINK-20833 may be of
interest to you.
On 6/18/2021 5:15 PM, Kevin Lam wrote:
Hi all,
I'm interested in instrumenting an Apache Flink application so that we
can monitor exceptions. I was wondering what the best practices are
here? Is there a good way to observe all the exceptions inside of a
Flink application, including Flink internals?
We are currently thinking of using Bugsnag, which has some steps to
integrate with java applications:
https://docs.bugsnag.com/platforms/java/other/
<https://docs.bugsnag.com/platforms/java/other/>, which works fine for
uncaught exceptions in the job manager / pipeline driver context, but
doesn't catch anything outside of that.
We're also interested in reporting on exceptions that occur in the job
execution context, eg. in task managers.
Any tips/suggestions? I'd love to learn more about exception tracking
and handling in Flink :)