[ https://issues.apache.org/jira/browse/FLINK-10462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641324#comment-16641324 ]
zhijiang commented on FLINK-10462: ---------------------------------- [~NicoK], thanks for your feedback! In our special cases, if many various vertex tasks are deployed in the same {{TaskManager}} in session mode, many tcp connections would be established in large scale jobs. Furthermore, one tcp connection would take about two seconds to be established sometimes, and it would hurt the performance in TPCH benchmark. I agree with your point that it may bring network bottleneck if reusing the same tcp connection unlimitedly. Maybe we can tradeoff the way by configuring the connection pool as you suggested. I think we can suspend this Jira temporarily until reaching a better solution, and i will further think of this issue later. :) > Remove ConnectionIndex for further sharing tcp connection in credit-based > mode > ------------------------------------------------------------------------------- > > Key: FLINK-10462 > URL: https://issues.apache.org/jira/browse/FLINK-10462 > Project: Flink > Issue Type: Improvement > Components: Network > Affects Versions: 1.5.0, 1.5.1, 1.5.2, 1.5.3, 1.6.0, 1.6.1, 1.5.4 > Reporter: zhijiang > Assignee: zhijiang > Priority: Minor > > Every {{IntermediateResult}} generates a random {{ConnectionIndex}} which > will be included in {{ConnectionID}}. > The {{RemoteInputChannel}} requests to establish tcp connection via > {{ConnectionID}}. That means one tcp connection may be shared by multiple > {{RemoteInputChannel}} {{s which have the same ConnectionID}}. To do so, we > can reduce the physical connections between two \{{TaskManager}} s, and it > brings benefits for large scale jobs. > But this sharing is limited only for the same {{IntermediateResult}}, and I > think it is mainly because we may temporarily switch off {{autoread}} for the > channel during back pressure in previous network flow control. For > credit-based mode, the channel is always open for transporting different > intermediate data, so we can further share the tcp connection for different > {{IntermediateResults}} to remove the limit. -- This message was sent by Atlassian JIRA (v7.6.3#76005)