[GitHub] [flink-kubernetes-operator] gyfora opened a new pull request, #271: [FLINK-27820] Handle deployment errors on observe

GitBox Fri, 17 Jun 2022 06:13:42 -0700


gyfora opened a new pull request, #271:
URL: https://github.com/apache/flink-kubernetes-operator/pull/271


   This change allows the operator to recover from errors during and directly 
after Flink cluster submissions. One common cause for this would be a temporary 
unavailability of the kubernetes API which would prevent us from updating the 
status after deployment.
   
   Before this change this would lead to a fatal error where the operator would 
try to submit the flink cluster again and again (even though it is already 
deployed)
   
   To solve this case the following changes were introduced:
   
   - LastReconciledSpec now contains the CR generation for the deployed spec. 
During upgrades this is recorded during the first SUSPEND step of the upgrade 
operation
   - The CR generation is also added as an annotation to the Flink Cluster 
Deployment object
   - The default ReonciliationStatus for new deployments is now UPGRADING 
(previously it was DEPLOYED)
   - In the observer, if the reconciliation status is UPGRADING (new or 
upgrading deployments) we check whether the Deployment is there and if so we 
compare the generation annotation. If it matches to the target generation, we 
know it was a succesful upgrade so we upgrade the status


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-kubernetes-operator] gyfora opened a new pull request, #271: [FLINK-27820] Handle deployment errors on observe

Reply via email to