[ https://issues.apache.org/jira/browse/FLINK-37271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17925915#comment-17925915 ]
Piotr Nowojski commented on FLINK-37271: ---------------------------------------- If I understand correctly, you would like Flink's TMs to reconnect in case of some network failure, without restarting the job? It doesn't look like an easy thing to do, because you would have to make sure data integrity and potentially re-send some buffers/records. That would also mean sender would have to hold on to the buffered output data, until receiver acknowledges that it has received it. > Add network channel reconnect capability > ---------------------------------------- > > Key: FLINK-37271 > URL: https://issues.apache.org/jira/browse/FLINK-37271 > Project: Flink > Issue Type: Improvement > Components: Runtime / Network > Reporter: Zhenqiu Huang > Priority: Minor > Fix For: 1.20.1, 1.20.2 > > > In our org, we are using the security proxy to achieve inter host secured > communication. During the proxy rollout, channel between TMs will be > disconnected. It will cause downtime. Beside this, we can't guarantee the > rollout of proxy to all of the host at the same. It could cause a job fail > multiple times during the proxy rollout. -- This message was sent by Atlassian Jira (v8.20.10#820010)