Re: [PR] feat: autoscaling decision parallelism improvement [FLINK-34563] [flink-kubernetes-operator]

via GitHub Tue, 05 Mar 2024 04:53:42 -0800


Yang-LI-CS commented on PR #787:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/787#issuecomment-1978713687


   > Hi @Yang-LI-CS! Thank you for the PR. The general idea of using empty task 
slots is good, but the devil is in the details. First of all, we need to ensure 
that we apply the adjustment at the right time. Then, it needs to work with 
slot sharing disabled. Finally, there is already a parallelism correction based 
on the number of key groups which this feature should not interfere with.
   > 
   > Overall, these constraints still make this feature useful if the 
constraints are satisfied, but it introduces quite a bit of complexity. I would 
like to us to take a step back and ask what we will achieve from filling all 
the available slots. Can we expect TaskManager load to be better balanced or 
scaling decisions to be more stable?
   
   Hi @mxm , thanks for the review, 
   
   > it needs to work with slot sharing disabled. 
   
   I am totally agreed with this, thanks for reminding me this and indeed my 
flink cluster does not use slot sharing
   
   > here is already a parallelism correction based on the number of key groups 
which this feature should not interfere with
   
   I'm going to review the code to understand this parallelism correction 🙏 
   
   > I would like to us to take a step back and ask what we will achieve from 
filling all the available slots. Can we expect TaskManager load to be better 
balanced or scaling decisions to be more stable?
   
   
   
   
   In my use case, the job graph comprises only 6 operators and allocates 6 
task slots per task manager. Prior to implementing this improvement, setting 
the maximum parallelism to 18 resulted in frequent rescaling of my Flink job to 
various levels of parallelism for all vertex, such as 7, 8, 13, 15, and 16. 
However, with this enhancement, the Flink job rescales the biggest vertex only 
to parallelism levels of 6, 12, and 18. While it's true that other vertices may 
still experience rescaling to parallelism levels like 7, 13, or 15, the overall 
frequency of rescaling triggered by the Flink autoscaler has significantly 
decreased.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] feat: autoscaling decision parallelism improvement [FLINK-34563] [flink-kubernetes-operator]

Reply via email to