[jira] [Commented] (FLINK-36673) Operator is not properly handling failed deployments without savepoints

Gyula Fora (Jira) Sat, 01 Mar 2025 22:37:05 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-36673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17931787#comment-17931787
 ]


Gyula Fora commented on FLINK-36673:
------------------------------------

I think this is a duplicate of 
https://issues.apache.org/jira/browse/FLINK-37370 and has been fixed on the 
main/release-11 branch

Can you please confirm?

> Operator is not properly handling failed deployments without savepoints
> -----------------------------------------------------------------------
>
>                 Key: FLINK-36673
>                 URL: https://issues.apache.org/jira/browse/FLINK-36673
>             Project: Flink
>          Issue Type: Bug
>          Components: Kubernetes Operator
>            Reporter: Yaroslav Tkachenko
>            Priority: Major
>         Attachments: Screenshot 2025-02-28 at 4.15.26 PM.png, Screenshot 
> 2025-02-28 at 8.51.37 PM.png, Screenshot 2025-02-28 at 8.55.36 PM.png, 
> stacktrace.txt
>
>
> I noticed an issue after upgrading Flink Kubernetes Operator from 1.9 to 1.10.
> When I deploy a FlinkDeployment that fails during the startup, I get a 
> "ReconciliationException: Could not observe latest savepoint information" 
> (full stacktrace is attached). 
> I think the issue was introduced here: 
> [https://github.com/apache/flink-kubernetes-operator/pull/871.] 
> *AbstractFlinkService.getLastCheckpoint* now throws a 
> *ReconciliationException* when a savepoint is not available, and 
> *SnapshotObserver.observeLatestCheckpoint* doesn't handle it properly. I 
> think having no savepoint is completely normal in some situations (e.g. a 
> brand new job). 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-36673) Operator is not properly handling failed deployments without savepoints

Reply via email to