[ 
https://issues.apache.org/jira/browse/FLINK-37271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17925915#comment-17925915
 ] 

Piotr Nowojski commented on FLINK-37271:
----------------------------------------

If I understand correctly, you would like Flink's TMs to reconnect in case of 
some network failure, without restarting the job? It doesn't look like an easy 
thing to do, because you would have to make sure data integrity and potentially 
re-send some buffers/records. That would also mean sender would have to hold on 
to the buffered output data, until receiver acknowledges that it has received 
it.

> Add network channel reconnect capability
> ----------------------------------------
>
>                 Key: FLINK-37271
>                 URL: https://issues.apache.org/jira/browse/FLINK-37271
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Network
>            Reporter: Zhenqiu Huang
>            Priority: Minor
>             Fix For: 1.20.1, 1.20.2
>
>
> In our org, we are using the security proxy to achieve inter host secured 
> communication. During the proxy rollout, channel between TMs will be 
> disconnected. It will cause downtime. Beside this, we can't guarantee the 
> rollout of proxy to all of the host at the same. It could cause a job fail 
> multiple times during the proxy rollout.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to