[ https://issues.apache.org/jira/browse/FLINK-30266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gyula Fora closed FLINK-30266. ------------------------------ Resolution: Fixed Merged to main 72ad3639e60fbf27dd408dabbe69f46d69ff52f9 > Recovery reconciliation loop fails if no checkpoint has been created yet > ------------------------------------------------------------------------ > > Key: FLINK-30266 > URL: https://issues.apache.org/jira/browse/FLINK-30266 > Project: Flink > Issue Type: Bug > Components: Kubernetes Operator > Affects Versions: kubernetes-operator-1.3.0 > Reporter: Maximilian Michels > Assignee: Gyula Fora > Priority: Blocker > Labels: pull-request-available > Fix For: kubernetes-operator-1.3.0 > > > When the upgradeMode is LAST-STATE, the operator fails to reconcile a failed > application unless at least one checkpoint has already been created. The > expected behavior would be that the job starts with empty state. > {noformat} > 2022-12-01 10:58:35,596 o.a.f.k.o.l.AuditUtils [INFO ] [app] >>> > Status | Error | UPGRADING | > {"type":"org.apache.flink.kubernetes.operator.exception.DeploymentFailedException","message":"HA > metadata not available to restore from last state. It is possible that the > job has finished or terminally failed, or the configmaps have been deleted. > Manual restore > required.","additionalMetadata":{"reason":"RestoreFailed"},"throwableList":[]} > {noformat} > {noformat} > 2022-12-01 10:44:49,480 i.j.o.p.e.ReconciliationDispatcher [ERROR] [app] > Error during event processing ExecutionScope{ resource id: > ResourceID{name='app', namespace='namespace'}, version: 216933301} failed. > org.apache.flink.kubernetes.operator.exception.ReconciliationException: > java.lang.RuntimeException: This indicates a bug... > at > org.apache.flink.kubernetes.operator.controller.FlinkDeploymentController.reconcile(FlinkDeploymentController.java:133) > at > org.apache.flink.kubernetes.operator.controller.FlinkDeploymentController.reconcile(FlinkDeploymentController.java:54) > at > io.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:136) > at > io.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:94) > at > org.apache.flink.kubernetes.operator.metrics.OperatorJosdkMetrics.timeControllerExecution(OperatorJosdkMetrics.java:80) > at > io.javaoperatorsdk.operator.processing.Controller.reconcile(Controller.java:93) > at > io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.reconcileExecution(ReconciliationDispatcher.java:130) > at > io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleReconcile(ReconciliationDispatcher.java:110) > at > io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleDispatch(ReconciliationDispatcher.java:81) > at > io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleExecution(ReconciliationDispatcher.java:54) > at > io.javaoperatorsdk.operator.processing.event.EventProcessor$ReconcilerExecutor.run(EventProcessor.java:406) > at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown > Source) > at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown > Source) > at java.base/java.lang.Thread.run(Unknown Source) > Caused by: java.lang.RuntimeException: This indicates a bug... > at > org.apache.flink.kubernetes.operator.reconciler.deployment.ApplicationReconciler.deploy(ApplicationReconciler.java:180) > at > org.apache.flink.kubernetes.operator.reconciler.deployment.ApplicationReconciler.deploy(ApplicationReconciler.java:61) > at > org.apache.flink.kubernetes.operator.reconciler.deployment.AbstractJobReconciler.restoreJob(AbstractJobReconciler.java:212) > at > org.apache.flink.kubernetes.operator.reconciler.deployment.AbstractJobReconciler.reconcileSpecChange(AbstractJobReconciler.java:144) > at > org.apache.flink.kubernetes.operator.reconciler.deployment.AbstractFlinkResourceReconciler.reconcile(AbstractFlinkResourceReconciler.java:167) > at > org.apache.flink.kubernetes.operator.reconciler.deployment.AbstractFlinkResourceReconciler.reconcile(AbstractFlinkResourceReconciler.java:64) > at > org.apache.flink.kubernetes.operator.controller.FlinkDeploymentController.reconcile(FlinkDeploymentController.java:123) > ... 13 more {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)