Kubernetes operator questions

santhosh venkat Tue, 25 Feb 2025 16:16:20 -0800

Hi,

I am a newbie to the Flink-kubernetes operator. We are planning to
adopt/use it in my company, and it would be great if someone can help
answer my questions.


   1. It seems like the kubernetes operator is coupled with the
   auto-scaler. The operator is managing the lifecycle of the Flink jobs in a
   kubernetes cluster and the auto-scaler is scaling these jobs, depending
   upon the catchup-time and busy-time configurations. Just trying to
   understand why this is coupled. It might result in lack of failure
   isolation(auto-scaler failing causing the deployment to get affected and
   vice-versa), ability to scale the operator independently of the auto-scaler
   and deployment of two independent components are tied. Am I understanding
   this correctly or missing something
   2. Pluggability of auto-scaling policies: Currently the auto-scaling
   policies are not pluggable, i.e, there is only one logic that gets executed
   as part of the reconciliation loop other than job deployments. Would it be
   acceptable if we can develop this support(make auto-scaling policies
   pluggable) in the operator, and contribute it back to upstream?
   3. Metrics storage: The metrics that K8s auto-scaler uses are stored in
   the config map in k8s. Essentially, there is 1 MB limitation on the value
   of config maps in our k8s cluster and wouldn't this be a bottleneck. So
   trying to understand why this is the default option in K8s operator. Even
   though the metric storage option is pluggable, just want to understand the
   rationale behind this choice.
   4. The in-place re-scaling is only supported in native mode(k8s mode)
   and not supported in auto-scaler-standalone mode. Is it okay if we can
   develop and contribute this back to the operator upstream?
   5. The operator does not support creation of Flink session clusters. We
   have SQL use-cases with jupyter notebooks for which this might be
   necessary(testing purposes). Would it be possible if we can develop this
   support to the operator, contribute it back to upstream?

Majority of these questions come from my experience of playing with the
operator locally through helm-chart and deployment-yaml samples. They might
not be accurate. I am happy to stand corrected.

Thanks.

Kubernetes operator questions

Reply via email to