I make a test on the Application upgrade rollback feature, but this function fails.The Flink application mode job cannot roll back to last stable spec. As shown in the follow example, I declare a error pod-template without a container named flink-main-container to test rollback feature. However, only the error of deploying the flink application job failed without rollback.
Error: org.apache.flink.client.deployment.ClusterDeploymentException: Could not create Kubernetes cluster "basic-example". at org.apache.flink.kubernetes.KubernetesClusterDescriptor.deployClusterInternal(KubernetesClusterDescriptor.java:292) Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https://*/k8s/clusters/c-fwkxh/apis/apps/v1/namespaces/test-flink/deployments. Message: Deployment.apps "basic-example" is invalid: [spec.template.spec.containers[0].name: Required value, spec.template.spec.containers[0].image: Required value]. Received status: Status(apiVersion=v1, code=422, details=StatusDetails(causes=[StatusCause(field=spec.template.spec.containers[0].name, message=Required value, reason=FieldValueRequired, additionalProperties={}), StatusCause(field=spec.template.spec.containers[0].image, message=Required value, reason=FieldValueRequired, additionalProperties={})], group=apps, kind=Deployment, name=flink-bdra-sql-application-job-s3p, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=Deployment.apps "flink-bdra-sql-application-job-s3p" is invalid: [spec.template.spec.containers[0].name: Required value, spec.template.spec.containers[0].image: Required value], metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=Invalid, status=Failure, additionalProperties={}). at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:673) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:612) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:560) Env: Flink version:Flink 1.16 Flink Kubernetes Operator:1.3.1 Last stable spec: apiVersion: flink.apache.org/v1beta1 kind: FlinkDeployment metadata: name: basic-example spec: image: flink:1.16 flinkVersion: v1_16 flinkConfiguration: taskmanager.numberOfTaskSlots: "2" kubernetes.operator.deployment.rollback.enabled: true state.savepoints.dir: s3:///flink-data/savepoints state.checkpoints.dir: s3:///flink-data/checkpoints high-availability: org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory high-availability.storageDir: s3:///flink-data/ha serviceAccount: flink podTemplate: spec: containers: - name: flink-main-container env: - name: TZ value: Asia/Shanghai jobManager: resource: memory: "2048m" cpu: 1 taskManager: resource: memory: "2048m" cpu: 1 job: jarURI: local:///opt/flink/examples/streaming/StateMachineExample.jar parallelism: 2 upgradeMode: stateless new Spec: apiVersion: flink.apache.org/v1beta1 kind: FlinkDeployment metadata: name: basic-example spec: image: flink:1.16 flinkVersion: v1_16 flinkConfiguration: taskmanager.numberOfTaskSlots: "2" kubernetes.operator.deployment.rollback.enabled: true state.savepoints.dir: s3:///flink-data/savepoints state.checkpoints.dir: s3:///flink-data/checkpoints high-availability: org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory high-availability.storageDir: s3:///flink-data/ha serviceAccount: flink podTemplate: spec: containers: - env: - name: TZ value: Asia/Shanghai jobManager: resource: memory: "2048m" cpu: 1 taskManager: resource: memory: "2048m" cpu: 1 job: jarURI: local:///opt/flink/examples/streaming/StateMachineExample.jar parallelism: 2 upgradeMode: stateless -- Best, Hjw