Hi , *Context:* Auto Scaling was introduced in Flink as part of FLIP-271[1]. It discusses one of the important aspects to provide a robust default scaling algorithm. a. Ensure scaling yields effective usage of assigned task slots. b. Ramp up in case of any backlog to ensure it gets processed in a timely manner c. Minimize the number of scaling decisions to prevent costly rescale operation The flip intends to add an auto scaling framework based on 6 major metrics and contains different types of threshold to trigger the scaling.
Thread[2] discusses a different problem: why autoscaler is part of the operator instead of jobmanager at runtime. The Community decided to keep the autoscaling logic in the flink-kubernetes-operator. *Proposal: * In this discussion, I want to put forward a thought of extracting out the auto scaling logic into a new submodule in flink-kubernetes-operator repository[3], which will be independent of any resource manager/Operator. Currently the Autoscaling algorithm is very tightly coupled with the kubernetes API. This makes the autoscaling core algorithm not so easily extensible for different available resource managers like YARN, Mesos etc. A Separate autoscaling module inside the flink kubernetes operator will help other resource managers to leverage the autoscaling logic. [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-271%3A+Autoscaling [2] https://lists.apache.org/thread/pvfb3fw99mj8r1x8zzyxgvk4dcppwssz [3] https://github.com/apache/flink-kubernetes-operator Bests, Samrat