[ https://issues.apache.org/jira/browse/FLINK-15249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17048840#comment-17048840 ]
Zhu Zhu commented on FLINK-15249: --------------------------------- [~nppoly] sorry for the late response. Just checked the PR and run the test again. Looks to me that this change is targeting to to improve the region building performance for a specific topology are rare in production cases. However, the performance for the most common topologies are becoming worse (I tested a 4000x4000 ALL-to-ALL pipelined connected topology, the performance with the new change is much slower, to be specific 1570ms v.s. 929ms). I think we should not make regression to the common cases to improve a corner case. So I would say not to make this change. Need to mention that the set merging cost should not be the critical part for region building if there are All-to-All connections. Since the edge iteration complexity would be much larger (V^2 compared to V). If there is not All-to-All connection, the region building time cost is usually low and not a problem. > Improve PipelinedRegions calculation with Union Set > --------------------------------------------------- > > Key: FLINK-15249 > URL: https://issues.apache.org/jira/browse/FLINK-15249 > Project: Flink > Issue Type: Improvement > Components: Runtime / Coordination > Reporter: Chongchen Chen > Priority: Major > Labels: pull-request-available > Attachments: PipelinedRegionComputeUtil.diff, > RegionFailoverPerfTest.java, new.diff > > Time Spent: 10m > Remaining Estimate: 0h > > Union Set's Merge Set cost is O(1). current implementation is O(N). the > attachment is patch. > [Disjoint Set Data > Structure|[https://en.wikipedia.org/wiki/Disjoint-set_data_structure]] -- This message was sent by Atlassian Jira (v8.3.4#803005)