Hi everyone, This FLIP [1] proposes a pluggable interface for failure handling allowing users to implement custom failure logic using the plugin framework. Motivated by existing proposals [2] and tickets [3], this enables use-cases like: assigning particular types to failures (e.g., User or System), emitting custom metrics per type (e.g., application or platform), even exposing errors to downstream consumers (e.g., notification systems).
Thanks to Piotr and Anton for the initial reviews and discussions! For anyone interested, the starting point would be the FLIP [1] that I created, describing the motivation and the proposed changes (part of the core, runtime and web). The intuition behind this FLIP is being able to execute custom logic on failures by exposing a FailureListener interface. Implementation by users can be simply loaded to the system as Jar files. FailureListeners may also decide to assign failure tags to errors (expressed as strings), that will then be exposed as metadata by the UI/Rest interfaces. Feedback is always appreciated! Looking forward to your thoughts! [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-304%3A+Pluggable+failure+handling+for+Apache+Flink [2] https://docs.google.com/document/d/1pcHg9F3GoDDeVD5GIIo2wO67Hmjgy0-hRDeuFnrMgT4 [3] https://issues.apache.org/jira/browse/FLINK-20833 Cheers, Panagiotis