Hello, We recently upgraded the operator to 1.8.0 to leverage the new autoscaling features ( https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-release-1.8/docs/custom-resource/autoscaler/). The FlinkDeployment (application cluster) is set to flink v1_18 as well. I am able to observe the following event being reported in the logs of the operator.
o.a.f.k.o.l.AuditUtils [INFO ][flink/devpipeline] >>> Event | Info | SCALINGREPORT | Scaling execution enabled, begin scaling vertices:{ Vertex ID xxxxxxxx | Parallelism 2 -> 1 | Processing capacity Infinity -> Infinity | Target data rate 7.85}{ Vertex ID yyyyyyyy | Parallelism 2 -> 1 | Processing capacity Infinity -> Infinity | Target data rate 0.00}{ Vertex ID zzzzzzzz | Parallelism 2 -> 1 | Processing capacity Infinity -> Infinity | Target data rate 7.85}{ Vertex ID wwwwwwwww | Parallelism 2 -> 1 | Processing capacity 33235.72 -> 13294.29 | Target data rate 6.65} But the in-place autoscaling is not getting triggered. My understanding is that the autoscaler running within the k8s-operator should call the rescale api endpoint of the FlinkDeployment (devpipeline) with a parallelism overrides map (vertexId => parallelism) and that should trigger a redeploy of the jobGraph. But that is not happening. The restart of the FlinkDeployment overrides the map (vertexId => parallelism) in the configMap resource that stores the flink-config. Am I missing something? How do I debug this further? Here is the flink-config set within the k8s-operator. job.autoscaler.stabilization.interval: 1m job.autoscaler.target.utilization: 0.6 job.autoscaler.target.utilization.boundary: 0.2 pipeline.max-parallelism: 180 jobmanager.scheduler: adaptive Thank you Chetas