[jira] [Commented] (FLINK-30305) Operator deletes HA metadata during stateful upgrade, preventing potential manual rollback

Gyula Fora (Jira) Tue, 06 Dec 2022 14:53:04 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-30305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644052#comment-17644052
 ]


Gyula Fora commented on FLINK-30305:
------------------------------------

That’s correct , the savepoint from the status is only guaranteed to be latest 
if the job was observed in a terminal state.

> Operator deletes HA metadata during stateful upgrade, preventing potential 
> manual rollback
> ------------------------------------------------------------------------------------------
>
>                 Key: FLINK-30305
>                 URL: https://issues.apache.org/jira/browse/FLINK-30305
>             Project: Flink
>          Issue Type: Bug
>          Components: Kubernetes Operator
>    Affects Versions: kubernetes-operator-1.2.0
>            Reporter: Alexis Sarda-Espinosa
>            Priority: Major
>
> I was testing resiliency of jobs with Kubernetes-based HA enabled, upgrade 
> mode = {{savepoint}}, and with _automatic_ rollback _disabled_ in the 
> operator. After the job was running, I purposely created an erroneous spec by 
> changing my pod template to include an entry in {{envFrom -> secretRef}} with 
> a name that doesn't exist. Schema validation passed, so the operator tried to 
> upgrade the job, but the new pod hangs with {{CreateContainerConfigError}}, 
> and I see this in the operator logs:
> {noformat}
> >>> Status | Info    | UPGRADING       | The resource is being upgraded
> Deleting deployment with terminated application before new deployment
> Deleting JobManager deployment and HA metadata.
> {noformat}
> Afterwards, even if I remove the non-existing entry from my pod template, the 
> operator can no longer propagate the new spec because "Job is not running yet 
> and HA metadata is not available, waiting for upgradeable state".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-30305) Operator deletes HA metadata during stateful upgrade, preventing potential manual rollback

Reply via email to