[ https://issues.apache.org/jira/browse/FLINK-33222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773606#comment-17773606 ]
Nicolas Fraison commented on FLINK-33222: ----------------------------------------- So I was wrong the release 1.7-snapshot is not affected by this bug thanks to [https://github.com/apache/flink-kubernetes-operator/pull/681] patch. Indeed deploying the app with an {{{}initialSavepointPath{}}}: * lastReconciledSpec get the update of generation from N to N+1 while stable spec generation stay at N. But no rollback detected as the [update|https://github.com/apache/flink-kubernetes-operator/pull/681/files#diff-29ea38a50cac5b4432dd0969bc3e2177e29a5507f8c7bb01b80f605a8740de41R169] is done after the [rollback|https://github.com/apache/flink-kubernetes-operator/pull/681/files#diff-29ea38a50cac5b4432dd0969bc3e2177e29a5507f8c7bb01b80f605a8740de41R146] check deployment is consider as DEPLOYED * then on second reconcile loop the stable spec generation is also updated from N to N+1 (in [patchAndCacheStatus|[https://github.com/apache/flink-kubernetes-operator/blob/release-1.6/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/controller/FlinkDeploymentController.java#L135]])and the deployment is consider as STABLE But this look quite brittle to me as just changing the position of the shouldRollBack or ReconciliationUtils.updateReconciliationMetadata could lead to that bad behaviour again. I'm wondering if we could not take in account the generation field in the [isLastReconciledSpecStable|https://github.com/apache/flink-kubernetes-operator/blob/release-1.6/flink-kubernetes-operator-api/src/main/java/org/apache/flink/kubernetes/operator/api/status/ReconciliationStatus.java#L91] > Operator rollback app when it should not > ---------------------------------------- > > Key: FLINK-33222 > URL: https://issues.apache.org/jira/browse/FLINK-33222 > Project: Flink > Issue Type: Bug > Components: Kubernetes Operator > Environment: Flink operator 1.6 - Flink 1.17.1 > Reporter: Nicolas Fraison > Priority: Major > > The operator can decide to rollback when an update of the job spec is > performed on > savepointTriggerNonce or initialSavepointPath if the app has been deployed > since more than KubernetesOperatorConfigOptions.DEPLOYMENT_READINESS_TIMEOUT. > > This is due to the objectmeta generation being > [updated|https://github.com/apache/flink-kubernetes-operator/blob/release-1.6/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/reconciler/deployment/AbstractFlinkResourceReconciler.java#L169] > when changing those spec and leading to the lastReconcileSpec not being > aligned with the stableReconcileSpec while those spec are well ignored when > checking for upgrade diff > > Looking at the main branch we should still face the same issue as the same > [update|https://github.com/apache/flink-kubernetes-operator/blob/main/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/reconciler/deployment/AbstractFlinkResourceReconciler.java#L169] > is performed at the end of the reconcile loop -- This message was sent by Atlassian Jira (v8.20.10#820010)