Yaroslav Tkachenko created FLINK-36673: ------------------------------------------
Summary: Operator is not properly handling failed deployments without savepoints Key: FLINK-36673 URL: https://issues.apache.org/jira/browse/FLINK-36673 Project: Flink Issue Type: Bug Components: Kubernetes Operator Reporter: Yaroslav Tkachenko Attachments: stacktrace.txt I noticed an issue after upgrading Flink Kubernetes Operator from 1.9 to 1.10. When I deploy a FlinkDeployment that fails during the startup, I get a "ReconciliationException: Could not observe latest savepoint information" (full stacktrace is attached). I think the issue was introduced here: [https://github.com/apache/flink-kubernetes-operator/pull/871.] *AbstractFlinkService.getLastCheckpoint* now throws a *ReconciliationException* when a savepoint is not available, and *SnapshotObserver.observeLatestCheckpoint* doesn't handle it properly. I think having no savepoint is completely normal in some situations (e.g. a brand new job). -- This message was sent by Atlassian Jira (v8.20.10#820010)