[ https://issues.apache.org/jira/browse/FLINK-36404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hong Liang Teoh updated FLINK-36404: ------------------------------------ Affects Version/s: prometheus-connector-1.0.0 > PrometheusSinkWriteException thrown by the response callback may not cause > job to fail > -------------------------------------------------------------------------------------- > > Key: FLINK-36404 > URL: https://issues.apache.org/jira/browse/FLINK-36404 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Prometheus > Affects Versions: prometheus-connector-1.0.0 > Reporter: Lorenzo Nicora > Priority: Critical > > *Issue* > {{PrometheusSinkWriteException}} thrown by {{HttpResponseCallback}} do not > cause the httpclient IOReactor to fail, being actually swallowed, and > preventing the job from failing. > Also, related: exceptions from the IOReactor eventually causes the response > callback {{failed}} to be called. Allowing the user to set > DISCARD_AND_CONTINUE on generic exceptions thrown by the client may hide > rethrown exceptions. Also, there is really no use of not failing on a generic > unhandled exceptions from the client. > *Solution* > 1. Intercept {{PrometheusSinkWriteException}} up the httpclient stack, adding > to the client a {{IOSessionListener}} to that can rethow those exceptions, > causing the reactor to actually fail, and consequently also the operator to > fail. > 2. Remove the ability to configure of error handling behaviour on generic > exceptions thrown by the httpclient. The job should always fail. > 3. When the httpclient IOReactor fail, a long chain of exceptions is logged. > To keep the actual root cause evident, the response callback should log to > ERROR when the exception happens -- This message was sent by Atlassian Jira (v8.20.10#820010)