Re: [PR] [FLINK-33977][runtime] Adaptive scheduler may not minimize the number of TMs during downscaling [flink]

via GitHub Mon, 23 Sep 2024 04:30:44 -0700


1996fanrui commented on code in PR #25218:
URL: https://github.com/apache/flink/pull/25218#discussion_r1771234937



##########
flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/adaptive/allocator/StateLocalitySlotAssigner.java:
##########
@@ -139,6 +154,36 @@ public Collection<SlotAssignment> assignSlots(
         return assignments;
     }
 
+    /**
+     * The sorting principle and strategy here are very similar to {@link

Review Comment:
   Let's discuss this topic in this thread.
   
   
   
   Besides this comment 
https://github.com/apache/flink/pull/25218#issuecomment-2367911956 , I wanna 
add a case.
   
   IIUC, this bug will be critical when the `taskmanager.numberOfTaskSlots` is 
high. For examples: `taskmanager.numberOfTaskSlots` is 10, and the job 
parallelism is 100, job needs 10 TMs.
   
   - TM0 : runs subtask0 to subtask9
   - TM1 : runs subtask10 to subtask19
   - TM2 : runs subtask20 to subtask29
   - ...
   - TM9 : runs subtask90 to subtask99
   
   When the parallelism is changed from 100 to 10. Ideally: job only needs 1 
TMs.
   
   - New subtask0 : processing data from old subtask0 to subtask9
   - New subtask1 : processing data from old subtask10 to subtask19
   - New subtask2 : processing data from old subtask20 to subtask29
   - ...
   - New subtask9 : processing data from old subtask90 to subtask99
   
   But after state locality is applied: 
   
   - The state of new subtask0 is from TM0
   - The state of new subtask1 is from TM1
   - The state of new subtask2 is from TM1
   - ...
   - The state of new subtask9 is from TM9
   
   So, after rescaling, all TMs cannot be released. (It means the rescale is 
meaningless)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [FLINK-33977][runtime] Adaptive scheduler may not minimize the number of TMs during downscaling [flink]

Reply via email to