Jean-Benoit Hamard created FLINK-38095:
------------------------------------------

             Summary: Error in Session Job autoscaler when realizing 
parallelism overrides
                 Key: FLINK-38095
                 URL: https://issues.apache.org/jira/browse/FLINK-38095
             Project: Flink
          Issue Type: Bug
          Components: Autoscaler, Kubernetes Operator
    Affects Versions: kubernetes-operator-1.11.0
            Reporter: Jean-Benoit Hamard
             Fix For: kubernetes-operator-1.11.0


The kubernetes operator v1.11 fails too apply autoscaler overrides to a session 
job, with the following error :


java.lang.NullPointerException
atorg.apache.flink.kubernetes.operator.autoscaler.KubernetesScalingRealizer.realizeParallelismOverrides(KubernetesScalingRealizer.java:52)
atorg.apache.flink.kubernetes.operator.autoscaler.KubernetesScalingRealizer.realizeParallelismOverrides(KubernetesScalingRealizer.java:40)
atorg.apache.flink.autoscaler.JobAutoScalerImpl.applyParallelismOverrides(JobAutoScalerImpl.java:166)
atorg.apache.flink.autoscaler.JobAutoScalerImpl.scale(JobAutoScalerImpl.java:111)
atorg.apache.flink.kubernetes.operator.reconciler.deployment.AbstractFlinkResourceReconciler.applyAutoscaler(AbstractFlinkResourceReconciler.java:209)
atorg.apache.flink.kubernetes.operator.reconciler.deployment.AbstractFlinkResourceReconciler.reconcile(AbstractFlinkResourceReconciler.java:132)
atorg.apache.flink.kubernetes.operator.controller.FlinkSessionJobController.reconcile(FlinkSessionJobController.java:121)
atorg.apache.flink.kubernetes.operator.controller.FlinkSessionJobController.reconcile(FlinkSessionJobController.java:58)
atio.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:153)
atio.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:111)
atorg.apache.flink.kubernetes.operator.metrics.OperatorJosdkMetrics.timeControllerExecution(OperatorJosdkMetrics.java:80)
atio.javaoperatorsdk.operator.processing.Controller.reconcile(Controller.java:110)
atio.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.reconcileExecution(ReconciliationDispatcher.java:136)
atio.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleReconcile(ReconciliationDispatcher.java:117)
atio.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleDispatch(ReconciliationDispatcher.java:91)
atio.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleExecution(ReconciliationDispatcher.java:64)
atio.javaoperatorsdk.operator.processing.event.EventProcessor$ReconcilerExecutor.run(EventProcessor.java:452)
atjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java)
atjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java)
atjava.lang.Thread.run(Thread.java)

This prevents scaling parameters to apply to the job, and the operator keeps 
looping on that error.

Here is my the session job configuration :


job.autoscaler.catch-up.duration: 5m
job.autoscaler.enabled: "true"
job.autoscaler.metrics.window: 3m
job.autoscaler.restart.time: 2m
job.autoscaler.stabilization.interval: 1m
job.autoscaler.target.utilization.boundary: "0.2"
job.autoscaler.target.utilization: "0.6"
pipeline.max-parallelism: "720"
taskmanager.numberOfTaskSlots: "1"

I am able to provide more config/information if needed, dont hesitate to ask.

Thank you for your help.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to