[ 
https://issues.apache.org/jira/browse/FLINK-36673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17896753#comment-17896753
 ] 

Gyula Fora commented on FLINK-36673:
------------------------------------

[~sap1ens] I still fail to understand you hypothesis here. True that previously 
getCheckpointInfo(jobId, conf);
call wasn't directly wrapped in a try catch block here but the error still 
comes from that call itself. In previous operator version the same error would 
have been thrown on the caller side in the snapshot observer. Ultimately 
leading to the exact same situation.

> Operator is not properly handling failed deployments without savepoints
> -----------------------------------------------------------------------
>
>                 Key: FLINK-36673
>                 URL: https://issues.apache.org/jira/browse/FLINK-36673
>             Project: Flink
>          Issue Type: Bug
>          Components: Kubernetes Operator
>            Reporter: Yaroslav Tkachenko
>            Priority: Major
>         Attachments: stacktrace.txt
>
>
> I noticed an issue after upgrading Flink Kubernetes Operator from 1.9 to 1.10.
> When I deploy a FlinkDeployment that fails during the startup, I get a 
> "ReconciliationException: Could not observe latest savepoint information" 
> (full stacktrace is attached). 
> I think the issue was introduced here: 
> [https://github.com/apache/flink-kubernetes-operator/pull/871.] 
> *AbstractFlinkService.getLastCheckpoint* now throws a 
> *ReconciliationException* when a savepoint is not available, and 
> *SnapshotObserver.observeLatestCheckpoint* doesn't handle it properly. I 
> think having no savepoint is completely normal in some situations (e.g. a 
> brand new job). 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to