[GitHub] [flink] harker2015 commented on a diff in pull request #20739: [FLINK-29101] PipelinedRegionSchedulingStrategy benchmark shows performance degradation

GitBox Wed, 14 Sep 2022 00:07:59 -0700


harker2015 commented on code in PR #20739:
URL: https://github.com/apache/flink/pull/20739#discussion_r970409594



##########
flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/strategy/PipelinedRegionSchedulingStrategy.java:
##########
@@ -207,72 +214,97 @@ public void onExecutionStateChange(
             final ExecutionVertexID executionVertexId, final ExecutionState 
executionState) {
         if (executionState == ExecutionState.FINISHED) {
             maybeScheduleRegions(
-                    
getDownstreamRegionsOfVertex(schedulingTopology.getVertex(executionVertexId)));
+                    getBlockingDownstreamRegionsOfVertex(
+                            schedulingTopology.getVertex(executionVertexId)));
         }
     }
 
     @Override
     public void onPartitionConsumable(final IntermediateResultPartitionID 
resultPartitionId) {}
 
     private void maybeScheduleRegions(final Set<SchedulingPipelinedRegion> 
regions) {
-        final List<SchedulingPipelinedRegion> regionsSorted =
-                SchedulingStrategyUtils.sortPipelinedRegionsInTopologicalOrder(
-                        schedulingTopology, regions);
+        final Set<SchedulingPipelinedRegion> regionsToSchedule = new 
LinkedHashSet<>();
+        LinkedHashSet<SchedulingPipelinedRegion> nextRegions = new 
LinkedHashSet<>(regions);
+        while (!nextRegions.isEmpty()) {
+            nextRegions = addSchedulableAndGetNextRegions(nextRegions, 
regionsToSchedule);
+        }
+        // schedule regions in topological order.
+        SchedulingStrategyUtils.sortPipelinedRegionsInTopologicalOrder(
+                        schedulingTopology, regionsToSchedule)
+                .forEach(this::scheduleRegion);
+    }
 
+    private LinkedHashSet<SchedulingPipelinedRegion> 
addSchedulableAndGetNextRegions(
+            Set<SchedulingPipelinedRegion> currentRegions,
+            Set<SchedulingPipelinedRegion> regionsToSchedule) {
+        LinkedHashSet<SchedulingPipelinedRegion> nextRegions = new 
LinkedHashSet<>();
+        // cache consumedPartitionGroup's consumable status to avoid compute 
repeatedly.
         final Map<ConsumedPartitionGroup, Boolean> consumableStatusCache = new 
HashMap<>();
-        final Set<SchedulingPipelinedRegion> downstreamSchedulableRegions = 
new HashSet<>();
-        for (SchedulingPipelinedRegion region : regionsSorted) {
-            if (maybeScheduleRegion(region, consumableStatusCache)) {
-                downstreamSchedulableRegions.addAll(
-                        consumedPartitionGroupsOfRegion.getOrDefault(region, 
Collections.emptySet())
-                                .stream()
-                                .flatMap(
-                                        consumedPartitionGroups ->
-                                                partitionGroupConsumerRegions
-                                                        .getOrDefault(
-                                                                
consumedPartitionGroups,
-                                                                
Collections.emptySet())
-                                                        .stream())
-                                .collect(Collectors.toSet()));
+        final Set<ConsumedPartitionGroup> visitedConsumedPartitionGroups = new 
HashSet<>();
+
+        for (SchedulingPipelinedRegion currentRegion : currentRegions) {
+            if (isRegionSchedulable(currentRegion, consumableStatusCache, 
regionsToSchedule)) {
+                regionsToSchedule.add(currentRegion);
+                producedPartitionGroupsOfRegion
+                        .getOrDefault(currentRegion, Collections.emptySet())
+                        .forEach(
+                                (consumedPartitionGroup) -> {

Review Comment:
   For consistency, shall we use producedPartitionGroup instead of 
consumedPartitionGroup?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] harker2015 commented on a diff in pull request #20739: [FLINK-29101] PipelinedRegionSchedulingStrategy benchmark shows performance degradation

Reply via email to