Hi, Recently I upgraded the flink-kubernetes-operator from 1.4.0 to 1.6.1 to use Flink 1.18. After that, the operator kept reporting the following exception:
2023-11-21 03:26:50,505 o.a.f.k.o.r.d.AbstractFlinkResourceReconciler [INFO > ][sn-push/sn-push-decision-maker-log-s3-hive-prd] Resource fully > reconciled, nothing to do... > > 2023-11-21 03:26:50,727 o.a.f.r.r.RestClient [WARN > ][realtime-streaming/realtime-perf-report-main-prd-test] Rest endpoint > shutdown failed. > > java.util.concurrent.TimeoutException > > at java.base/java.util.concurrent.CompletableFuture.timedGet(Unknown > Source) > > at java.base/java.util.concurrent.CompletableFuture.get(Unknown Source) > > at org.apache.flink.runtime.rest.RestClient.shutdown(RestClient.java:227) > > at > org.apache.flink.client.program.rest.RestClusterClient.close(RestClusterClient.java:270) > > at > org.apache.flink.kubernetes.operator.service.AbstractFlinkService.getTaskManagersInfo(AbstractFlinkService.java:925) > > at > org.apache.flink.kubernetes.operator.service.AbstractFlinkService.getClusterInfo(AbstractFlinkService.java:621) > > at > org.apache.flink.kubernetes.operator.observer.deployment.AbstractFlinkDeploymentObserver.observeClusterInfo(AbstractFlinkDeploymentObserver.java:85) > > at > org.apache.flink.kubernetes.operator.observer.deployment.AbstractFlinkDeploymentObserver.observeInternal(AbstractFlinkDeploymentObserver.java:75) > > at > org.apache.flink.kubernetes.operator.observer.AbstractFlinkResourceObserver.observe(AbstractFlinkResourceObserver.java:49) > > at > org.apache.flink.kubernetes.operator.controller.FlinkDeploymentController.reconcile(FlinkDeploymentController.java:129) > > at > org.apache.flink.kubernetes.operator.controller.FlinkDeploymentController.reconcile(FlinkDeploymentController.java:56) > > at > io.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:138) > > at > io.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:96) > > at > org.apache.flink.kubernetes.operator.metrics.OperatorJosdkMetrics.timeControllerExecution(OperatorJosdkMetrics.java:80) > > at > io.javaoperatorsdk.operator.processing.Controller.reconcile(Controller.java:95) > > at > io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.reconcileExecution(ReconciliationDispatcher.java:139) > > at > io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleReconcile(ReconciliationDispatcher.java:119) > > at > io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleDispatch(ReconciliationDispatcher.java:89) > > at > io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleExecution(ReconciliationDispatcher.java:62) > > at > io.javaoperatorsdk.operator.processing.event.EventProcessor$ReconcilerExecutor.run(EventProcessor.java:414) > > at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown > Source) > > at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown > Source) > > at java.base/java.lang.Thread.run(Unknown Source) > I tried to increase the rest timeout param of "job.autoscaler.flink.rest-client.timeout" to 60 s, yet it does not resolve the issue. Could you help check this out ? Thanks in advance.