[jira] [Created] (FLINK-36673) Operator is not properly handling failed deployments without savepoints

Yaroslav Tkachenko (Jira) Thu, 07 Nov 2024 15:52:22 -0800

Yaroslav Tkachenko created FLINK-36673:
------------------------------------------


             Summary: Operator is not properly handling failed deployments 
without savepoints
                 Key: FLINK-36673
                 URL: https://issues.apache.org/jira/browse/FLINK-36673
             Project: Flink
          Issue Type: Bug
          Components: Kubernetes Operator
            Reporter: Yaroslav Tkachenko
         Attachments: stacktrace.txt

I noticed an issue after upgrading Flink Kubernetes Operator from 1.9 to 1.10.

When I deploy a FlinkDeployment that fails during the startup, I get a 
"ReconciliationException: Could not observe latest savepoint information" (full 
stacktrace is attached). 

I think the issue was introduced here: 
[https://github.com/apache/flink-kubernetes-operator/pull/871.] 
*AbstractFlinkService.getLastCheckpoint* now throws a *ReconciliationException* 
when a savepoint is not available, and 
*SnapshotObserver.observeLatestCheckpoint* doesn't handle it properly. I think 
having no savepoint is completely normal in some situations (e.g. a brand new 
job). 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-36673) Operator is not properly handling failed deployments without savepoints

Reply via email to