[DISCUSS] Extract core autoscaling algorithm as new SubModule in flink-kubernetes-operator

Samrat Deb Thu, 16 Feb 2023 01:27:24 -0800

Hi ,

*Context:*
Auto Scaling was introduced in Flink as part of FLIP-271[1].
It discusses one of the important aspects to provide a robust default
scaling algorithm.
      a. Ensure scaling yields effective usage of assigned task slots.
      b. Ramp up in case of any backlog to ensure it gets processed in a
timely manner
      c. Minimize the number of scaling decisions to prevent costly rescale
operation
The flip intends to add an auto scaling framework based on 6 major metrics
and contains different types of threshold to trigger the scaling.


Thread[2] discusses a different problem: why autoscaler is part of the
operator instead of jobmanager at runtime.
The Community decided to keep the autoscaling logic in the
flink-kubernetes-operator.

*Proposal: *
In this discussion, I want to put forward a thought of extracting out the
auto scaling logic into a new submodule in flink-kubernetes-operator
repository[3],
which will be independent of any resource manager/Operator.
Currently the Autoscaling algorithm is very tightly coupled with the
kubernetes API.
This makes the autoscaling core algorithm not so easily extensible for
different available resource managers like YARN, Mesos etc.
A Separate autoscaling module inside the flink kubernetes operator will
help other resource managers to leverage the autoscaling logic.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-271%3A+Autoscaling
[2] https://lists.apache.org/thread/pvfb3fw99mj8r1x8zzyxgvk4dcppwssz
[3] https://github.com/apache/flink-kubernetes-operator


Bests,
Samrat

[DISCUSS] Extract core autoscaling algorithm as new SubModule in flink-kubernetes-operator

Reply via email to