yuanfenghu created FLINK-35823:
----------------------------------

             Summary: Introduce parameters to control the upper limit of 
rescale to avoid unlimited shrinkage due to server-side bottlenecks or data 
skew.
                 Key: FLINK-35823
                 URL: https://issues.apache.org/jira/browse/FLINK-35823
             Project: Flink
          Issue Type: Improvement
          Components: Autoscaler
            Reporter: yuanfenghu
             Fix For: 2.0.0


1. If a Flink application writes data to other external storage systems, such 
as HDFS, Kafka, etc., when the external server becomes the bottleneck of the 
entire task, such as the throughput of HDFS decreases, the writing IO time will 
increase, and the corresponding Flink The metric busy will also increase. At 
this time, the autoscaler will determine that the parallelism needs to be 
increased to increase the write rate. However, in the above case, due to the 
bottleneck of the external server, this will not work. This will cause the next 
determination cycle to continue to increase the parallelism until parallelism = 
max-parallelism.

2. If some tasks have data skew, it will also cause the same problem.

 
Therefore, we should introduce a new parameter judgment. If the degree of 
parallelism continues to increase, the throughput will basically remain the 
same. There is no need to expand  anymore.
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to