Yaroslav Tkachenko created FLINK-36673:
------------------------------------------
Summary: Operator is not properly handling failed deployments
without savepoints
Key: FLINK-36673
URL: https://issues.apache.org/jira/browse/FLINK-36673
Project: Flink
Issue Type: Bug
Components: Kubernetes Operator
Reporter: Yaroslav Tkachenko
Attachments: stacktrace.txt
I noticed an issue after upgrading Flink Kubernetes Operator from 1.9 to 1.10.
When I deploy a FlinkDeployment that fails during the startup, I get a
"ReconciliationException: Could not observe latest savepoint information" (full
stacktrace is attached).
I think the issue was introduced here:
[https://github.com/apache/flink-kubernetes-operator/pull/871.]
*AbstractFlinkService.getLastCheckpoint* now throws a *ReconciliationException*
when a savepoint is not available, and
*SnapshotObserver.observeLatestCheckpoint* doesn't handle it properly. I think
having no savepoint is completely normal in some situations (e.g. a brand new
job).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)