gyfora opened a new pull request, #271: URL: https://github.com/apache/flink-kubernetes-operator/pull/271
This change allows the operator to recover from errors during and directly after Flink cluster submissions. One common cause for this would be a temporary unavailability of the kubernetes API which would prevent us from updating the status after deployment. Before this change this would lead to a fatal error where the operator would try to submit the flink cluster again and again (even though it is already deployed) To solve this case the following changes were introduced: - LastReconciledSpec now contains the CR generation for the deployed spec. During upgrades this is recorded during the first SUSPEND step of the upgrade operation - The CR generation is also added as an annotation to the Flink Cluster Deployment object - The default ReonciliationStatus for new deployments is now UPGRADING (previously it was DEPLOYED) - In the observer, if the reconciliation status is UPGRADING (new or upgrading deployments) we check whether the Deployment is there and if so we compare the generation annotation. If it matches to the target generation, we know it was a succesful upgrade so we upgrade the status -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org