Jean-Benoit Hamard created FLINK-38095: ------------------------------------------
Summary: Error in Session Job autoscaler when realizing parallelism overrides Key: FLINK-38095 URL: https://issues.apache.org/jira/browse/FLINK-38095 Project: Flink Issue Type: Bug Components: Autoscaler, Kubernetes Operator Affects Versions: kubernetes-operator-1.11.0 Reporter: Jean-Benoit Hamard Fix For: kubernetes-operator-1.11.0 The kubernetes operator v1.11 fails too apply autoscaler overrides to a session job, with the following error : java.lang.NullPointerException atorg.apache.flink.kubernetes.operator.autoscaler.KubernetesScalingRealizer.realizeParallelismOverrides(KubernetesScalingRealizer.java:52) atorg.apache.flink.kubernetes.operator.autoscaler.KubernetesScalingRealizer.realizeParallelismOverrides(KubernetesScalingRealizer.java:40) atorg.apache.flink.autoscaler.JobAutoScalerImpl.applyParallelismOverrides(JobAutoScalerImpl.java:166) atorg.apache.flink.autoscaler.JobAutoScalerImpl.scale(JobAutoScalerImpl.java:111) atorg.apache.flink.kubernetes.operator.reconciler.deployment.AbstractFlinkResourceReconciler.applyAutoscaler(AbstractFlinkResourceReconciler.java:209) atorg.apache.flink.kubernetes.operator.reconciler.deployment.AbstractFlinkResourceReconciler.reconcile(AbstractFlinkResourceReconciler.java:132) atorg.apache.flink.kubernetes.operator.controller.FlinkSessionJobController.reconcile(FlinkSessionJobController.java:121) atorg.apache.flink.kubernetes.operator.controller.FlinkSessionJobController.reconcile(FlinkSessionJobController.java:58) atio.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:153) atio.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:111) atorg.apache.flink.kubernetes.operator.metrics.OperatorJosdkMetrics.timeControllerExecution(OperatorJosdkMetrics.java:80) atio.javaoperatorsdk.operator.processing.Controller.reconcile(Controller.java:110) atio.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.reconcileExecution(ReconciliationDispatcher.java:136) atio.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleReconcile(ReconciliationDispatcher.java:117) atio.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleDispatch(ReconciliationDispatcher.java:91) atio.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleExecution(ReconciliationDispatcher.java:64) atio.javaoperatorsdk.operator.processing.event.EventProcessor$ReconcilerExecutor.run(EventProcessor.java:452) atjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java) atjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java) atjava.lang.Thread.run(Thread.java) This prevents scaling parameters to apply to the job, and the operator keeps looping on that error. Here is my the session job configuration : job.autoscaler.catch-up.duration: 5m job.autoscaler.enabled: "true" job.autoscaler.metrics.window: 3m job.autoscaler.restart.time: 2m job.autoscaler.stabilization.interval: 1m job.autoscaler.target.utilization.boundary: "0.2" job.autoscaler.target.utilization: "0.6" pipeline.max-parallelism: "720" taskmanager.numberOfTaskSlots: "1" I am able to provide more config/information if needed, dont hesitate to ask. Thank you for your help. -- This message was sent by Atlassian Jira (v8.20.10#820010)