Hi everyone,

This FLIP [1] proposes a pluggable interface for failure handling allowing
users to implement custom failure logic using the plugin framework.
Motivated by existing proposals [2] and tickets [3], this enables use-cases
like: assigning particular types to failures (e.g., User or System),
emitting custom metrics per type (e.g., application or platform), even
exposing errors to downstream consumers (e.g., notification systems).

Thanks to Piotr and Anton for the initial reviews and discussions!

For anyone interested, the starting point would be the FLIP [1] that I
created,
describing the motivation and the proposed changes (part of the core,
runtime and web).

The intuition behind this FLIP is being able to execute custom logic on
failures by exposing a FailureListener interface. Implementation by users
can be simply loaded to the system as Jar files. FailureListeners may also
decide to assign failure tags to errors (expressed as strings),
that will then be exposed as metadata by the UI/Rest interfaces.

Feedback is always appreciated! Looking forward to your thoughts!

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-304%3A+Pluggable+failure+handling+for+Apache+Flink
[2]
https://docs.google.com/document/d/1pcHg9F3GoDDeVD5GIIo2wO67Hmjgy0-hRDeuFnrMgT4
[3] https://issues.apache.org/jira/browse/FLINK-20833

Cheers,
Panagiotis

Reply via email to