Seems the operator didn't get restarted automatically after the configmap is changed. After a roll-out restart, the exception disappeared. Never mind this issue. Thanks.
On Tue, Nov 21, 2023 at 11:31 AM Xiaolong Wang <xiaolong.w...@smartnews.com> wrote: > Hi, > > Recently I upgraded the flink-kubernetes-operator from 1.4.0 to 1.6.1 to > use Flink 1.18. After that, the operator kept reporting the following > exception: > > 2023-11-21 03:26:50,505 o.a.f.k.o.r.d.AbstractFlinkResourceReconciler [INFO >> ][sn-push/sn-push-decision-maker-log-s3-hive-prd] Resource fully >> reconciled, nothing to do... >> >> 2023-11-21 03:26:50,727 o.a.f.r.r.RestClient [WARN >> ][realtime-streaming/realtime-perf-report-main-prd-test] Rest endpoint >> shutdown failed. >> >> java.util.concurrent.TimeoutException >> >> at java.base/java.util.concurrent.CompletableFuture.timedGet(Unknown >> Source) >> >> at java.base/java.util.concurrent.CompletableFuture.get(Unknown Source) >> >> at org.apache.flink.runtime.rest.RestClient.shutdown(RestClient.java:227) >> >> at >> org.apache.flink.client.program.rest.RestClusterClient.close(RestClusterClient.java:270) >> >> at >> org.apache.flink.kubernetes.operator.service.AbstractFlinkService.getTaskManagersInfo(AbstractFlinkService.java:925) >> >> at >> org.apache.flink.kubernetes.operator.service.AbstractFlinkService.getClusterInfo(AbstractFlinkService.java:621) >> >> at >> org.apache.flink.kubernetes.operator.observer.deployment.AbstractFlinkDeploymentObserver.observeClusterInfo(AbstractFlinkDeploymentObserver.java:85) >> >> at >> org.apache.flink.kubernetes.operator.observer.deployment.AbstractFlinkDeploymentObserver.observeInternal(AbstractFlinkDeploymentObserver.java:75) >> >> at >> org.apache.flink.kubernetes.operator.observer.AbstractFlinkResourceObserver.observe(AbstractFlinkResourceObserver.java:49) >> >> at >> org.apache.flink.kubernetes.operator.controller.FlinkDeploymentController.reconcile(FlinkDeploymentController.java:129) >> >> at >> org.apache.flink.kubernetes.operator.controller.FlinkDeploymentController.reconcile(FlinkDeploymentController.java:56) >> >> at >> io.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:138) >> >> at >> io.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:96) >> >> at >> org.apache.flink.kubernetes.operator.metrics.OperatorJosdkMetrics.timeControllerExecution(OperatorJosdkMetrics.java:80) >> >> at >> io.javaoperatorsdk.operator.processing.Controller.reconcile(Controller.java:95) >> >> at >> io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.reconcileExecution(ReconciliationDispatcher.java:139) >> >> at >> io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleReconcile(ReconciliationDispatcher.java:119) >> >> at >> io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleDispatch(ReconciliationDispatcher.java:89) >> >> at >> io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleExecution(ReconciliationDispatcher.java:62) >> >> at >> io.javaoperatorsdk.operator.processing.event.EventProcessor$ReconcilerExecutor.run(EventProcessor.java:414) >> >> at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown >> Source) >> >> at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown >> Source) >> >> at java.base/java.lang.Thread.run(Unknown Source) >> > > I tried to increase the rest timeout param of > "job.autoscaler.flink.rest-client.timeout" > to 60 s, yet it does not resolve the issue. > > Could you help check this out ? Thanks in advance. >