Hello,

We recently upgraded the operator to 1.8.0 to leverage the new autoscaling
features (
https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-release-1.8/docs/custom-resource/autoscaler/).
The FlinkDeployment (application cluster) is set to flink v1_18 as well. I
am able to observe the following event being reported in the logs of the
operator.

o.a.f.k.o.l.AuditUtils         [INFO ][flink/devpipeline] >>> Event  | Info
  | SCALINGREPORT   | Scaling execution enabled, begin scaling vertices:{
Vertex ID xxxxxxxx | Parallelism 2 -> 1 | Processing capacity Infinity ->
Infinity | Target data rate 7.85}{ Vertex ID yyyyyyyy | Parallelism 2 -> 1
| Processing capacity Infinity -> Infinity | Target data rate 0.00}{ Vertex
ID zzzzzzzz | Parallelism 2 -> 1 | Processing capacity Infinity -> Infinity
| Target data rate 7.85}{ Vertex ID wwwwwwwww | Parallelism 2 -> 1 |
Processing capacity 33235.72 -> 13294.29 | Target data rate 6.65}

But the in-place autoscaling is not getting triggered. My understanding is
that the autoscaler running within the k8s-operator should call the rescale
api endpoint of the FlinkDeployment (devpipeline)  with a parallelism
overrides map (vertexId => parallelism) and that should trigger a redeploy
of the jobGraph. But that is not happening. The restart of the
FlinkDeployment overrides the map (vertexId => parallelism) in the
configMap resource that stores the flink-config.

Am I missing something? How do I debug this further?

Here is the flink-config set within the k8s-operator.

job.autoscaler.stabilization.interval: 1m
job.autoscaler.target.utilization: 0.6
job.autoscaler.target.utilization.boundary: 0.2
pipeline.max-parallelism: 180
jobmanager.scheduler: adaptive


Thank you
Chetas

Reply via email to