[ 
https://issues.apache.org/jira/browse/FLINK-15249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17048840#comment-17048840
 ] 

Zhu Zhu commented on FLINK-15249:
---------------------------------

[~nppoly] sorry for the late response. Just checked the PR and run the test 
again.
Looks to me that this change is targeting to to improve the region building 
performance for a specific topology are rare in production cases. However, the 
performance for the most common topologies are becoming worse (I tested a 
4000x4000 ALL-to-ALL pipelined connected topology, the performance with the new 
change is much slower, to be specific 1570ms v.s. 929ms).

I think we should not make regression to the common cases to improve a corner 
case. So I would say not to make this change.

Need to mention that the set merging cost should not be the critical part for 
region building if there are All-to-All connections. Since the edge iteration 
complexity would be much larger (V^2 compared to V). If there is not  
All-to-All connection, the region building time cost is usually low and not a 
problem. 

> Improve PipelinedRegions calculation with Union Set
> ---------------------------------------------------
>
>                 Key: FLINK-15249
>                 URL: https://issues.apache.org/jira/browse/FLINK-15249
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Coordination
>            Reporter: Chongchen Chen
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: PipelinedRegionComputeUtil.diff, 
> RegionFailoverPerfTest.java, new.diff
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Union Set's Merge Set cost is O(1). current implementation is O(N). the 
> attachment is patch.
> [Disjoint Set Data 
> Structure|[https://en.wikipedia.org/wiki/Disjoint-set_data_structure]]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to