[jira] [Updated] (FLINK-36404) PrometheusSinkWriteException thrown by the response callback may not cause job to fail

Hong Liang Teoh (Jira) Mon, 07 Oct 2024 01:44:04 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-36404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Hong Liang Teoh updated FLINK-36404:
------------------------------------
    Affects Version/s: prometheus-connector-1.0.0

> PrometheusSinkWriteException thrown by the response callback may not cause 
> job to fail
> --------------------------------------------------------------------------------------
>
>                 Key: FLINK-36404
>                 URL: https://issues.apache.org/jira/browse/FLINK-36404
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Connectors / Prometheus
>    Affects Versions: prometheus-connector-1.0.0
>            Reporter: Lorenzo Nicora
>            Priority: Critical
>
> *Issue*
> {{PrometheusSinkWriteException}} thrown by {{HttpResponseCallback}} do not 
> cause the httpclient IOReactor to fail, being actually swallowed, and 
> preventing the job from failing.
> Also, related: exceptions from the IOReactor eventually causes the response 
> callback {{failed}} to be called. Allowing the user to set 
> DISCARD_AND_CONTINUE on generic exceptions thrown by the client may hide 
> rethrown exceptions. Also, there is really no use of not failing on a generic 
> unhandled exceptions from the client.
> *Solution*
> 1. Intercept {{PrometheusSinkWriteException}} up the httpclient stack, adding 
> to the client a {{IOSessionListener}} to that can rethow those exceptions, 
> causing the reactor to actually fail, and consequently also the operator to 
> fail.
> 2. Remove the ability to configure of error handling behaviour on generic 
> exceptions thrown by the httpclient. The job should always fail.
> 3. When the httpclient IOReactor fail, a long chain of exceptions is logged. 
> To keep the actual root cause evident, the response callback should log to 
> ERROR when the exception happens



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-36404) PrometheusSinkWriteException thrown by the response callback may not cause job to fail

Reply via email to