[
https://issues.apache.org/jira/browse/FLINK-38095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18089163#comment-18089163
]
Dennis-Mircea Ciupitu commented on FLINK-38095:
-----------------------------------------------
Hi, [~jbhamard]! I tried to reproduce it on Flink Kubernetes Operator 1.15.0.
Can you confirm if this is still an issue and if it is reproducible on 1.15.0
version? If yes, please provide some more config/information.
> Error in Session Job autoscaler when realizing parallelism overrides
> --------------------------------------------------------------------
>
> Key: FLINK-38095
> URL: https://issues.apache.org/jira/browse/FLINK-38095
> Project: Flink
> Issue Type: Bug
> Components: Autoscaler, Kubernetes Operator
> Affects Versions: kubernetes-operator-1.11.0
> Reporter: Jean-Benoit Hamard
> Priority: Major
> Fix For: kubernetes-operator-1.11.0
>
>
> The kubernetes operator v1.11 fails too apply autoscaler overrides to a
> session job, with the following error :
> java.lang.NullPointerException
> atorg.apache.flink.kubernetes.operator.autoscaler.KubernetesScalingRealizer.realizeParallelismOverrides(KubernetesScalingRealizer.java:52)
> atorg.apache.flink.kubernetes.operator.autoscaler.KubernetesScalingRealizer.realizeParallelismOverrides(KubernetesScalingRealizer.java:40)
> atorg.apache.flink.autoscaler.JobAutoScalerImpl.applyParallelismOverrides(JobAutoScalerImpl.java:166)
> atorg.apache.flink.autoscaler.JobAutoScalerImpl.scale(JobAutoScalerImpl.java:111)
> atorg.apache.flink.kubernetes.operator.reconciler.deployment.AbstractFlinkResourceReconciler.applyAutoscaler(AbstractFlinkResourceReconciler.java:209)
> atorg.apache.flink.kubernetes.operator.reconciler.deployment.AbstractFlinkResourceReconciler.reconcile(AbstractFlinkResourceReconciler.java:132)
> atorg.apache.flink.kubernetes.operator.controller.FlinkSessionJobController.reconcile(FlinkSessionJobController.java:121)
> atorg.apache.flink.kubernetes.operator.controller.FlinkSessionJobController.reconcile(FlinkSessionJobController.java:58)
> atio.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:153)
> atio.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:111)
> atorg.apache.flink.kubernetes.operator.metrics.OperatorJosdkMetrics.timeControllerExecution(OperatorJosdkMetrics.java:80)
> atio.javaoperatorsdk.operator.processing.Controller.reconcile(Controller.java:110)
> atio.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.reconcileExecution(ReconciliationDispatcher.java:136)
> atio.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleReconcile(ReconciliationDispatcher.java:117)
> atio.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleDispatch(ReconciliationDispatcher.java:91)
> atio.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleExecution(ReconciliationDispatcher.java:64)
> atio.javaoperatorsdk.operator.processing.event.EventProcessor$ReconcilerExecutor.run(EventProcessor.java:452)
> atjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java)
> atjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java)
> atjava.lang.Thread.run(Thread.java)
> This prevents scaling parameters to apply to the job, and the operator keeps
> looping on that error.
> Here is my the session job configuration :
> job.autoscaler.catch-up.duration: 5m
> job.autoscaler.enabled: "true"
> job.autoscaler.metrics.window: 3m
> job.autoscaler.restart.time: 2m
> job.autoscaler.stabilization.interval: 1m
> job.autoscaler.target.utilization.boundary: "0.2"
> job.autoscaler.target.utilization: "0.6"
> pipeline.max-parallelism: "720"
> taskmanager.numberOfTaskSlots: "1"
> I am able to provide more config/information if needed, dont hesitate to ask.
> Thank you for your help.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)