Just to add. Even in your custom rollback mechanism in order to rollback you can check the last validated spec field to get the proper image id.
On Sun, 27 Apr 2025 at 21:07, Alex Nitavsky <alexnitav...@gmail.com> wrote: > Apriori your solution make sense. > Just keep in mind that some form of blue green deployment can be part of > the master soon. > > On our side we rely on this experimental feature for a while, works fine. > Also it can probably make sense to consider to contribute or at least > request the change to ensure that verification logic could be plugged. > > On Sun, 27 Apr 2025 at 21:05, Ehud Lev <ehud....@forter.com> wrote: > >> Hi Alex, >> >> Thanks for the response! >> >> Yes, we did consider the "Application upgrade rollbacks (Experimental)" >> feature. >> However, we decided not to use it mainly for two reasons: >> >> 1. >> >> We wanted the flexibility to run our own custom verification logic >> after deployment. >> 2. >> >> The "experimental" label made us concerned about potential >> instability in production environments. >> >> Regarding the blue-green deployment feature — as far as I know, it hasn’t >> been implemented yet. Please correct me if I’m wrong! >> Do you know if it's getting close to being ready? >> >> Also, based on what I described, do you think our current approach makes >> sense? >> Are there any pitfalls you think we might be missing? >> >> Thanks again for your help! >> >> On Sun, Apr 27, 2025 at 9:48 PM Alex Nitavsky <alexnitav...@gmail.com> >> wrote: >> >>> Hey, >>> >>> Did you consider to use the apache operator rollback feature? It can >>> probably cover the basic verification needs. Generally I would consider to >>> probably improve the apache operator rollback mechanism if it is not >>> sufficient. >>> >>> If not it worth to check the blue green deployment of the operator >>> feature request. We rely on similar in house mechanism to make more complex >>> verifications. >>> >>> Regards >>> Alex >>> >>> On Sun, 27 Apr 2025 at 20:44, Ehud Lev <ehud....@forter.com> wrote: >>> >>>> Hi Flink users, >>>> >>>> We have a few Flink topologies running in production, managed by the >>>> Flink Kubernetes Operator, and we typically deploy using ArgoCD. >>>> >>>> Occasionally, we encounter bad deployments and need to roll back. When >>>> the job state is not critical, we usually delete the state and restart the >>>> Flink job, relying on Kafka to manage the offsets. In some cases, we >>>> rollback to a specific savepoint, but managing savepoints manually has been >>>> difficult and error-prone. >>>> >>>> To improve this, we built a deployment verification and rollback >>>> automation using GitHub Actions and ArgoCD APIs. Here's the high-level >>>> flow: >>>> >>>> - >>>> >>>> Read the current (previous) deployment information (savepoint >>>> location, version, revision, etc.). >>>> - >>>> >>>> Trigger a new deployment using ArgoCD, with a postSync job that >>>> runs topology-specific verification scripts. >>>> - >>>> >>>> Check whether the deployment succeeded or failed. >>>> - >>>> >>>> If successful: >>>> - >>>> >>>> Send a Slack notification with deployment details. >>>> - >>>> >>>> If failed: >>>> - >>>> >>>> Capture the new savepoint created during the failed deployment. >>>> - >>>> >>>> Verify that this savepoint is different from the previous one. >>>> - >>>> >>>> Automatically roll back by patching the deployment to use the >>>> previous stable savepoint. >>>> - >>>> >>>> Send a Slack notification about the rollback. >>>> >>>> The postSync job also includes some custom validation logic for each >>>> topology. >>>> >>>> *My questions:* >>>> >>>> - >>>> >>>> Does this approach make sense? >>>> - >>>> >>>> Is this considered a bad practice? >>>> - >>>> >>>> Has anyone else built something similar or solved deployment >>>> verification and rollback in a different way? >>>> >>>> Would love to hear your thoughts and any lessons learned. >>>> >>>> Thanks! >>>> -- >>>> Ehud Lev, Staff Engineer >>>> >>>> >> >> -- >> Ehud Lev, Staff Engineer >> email: ehud....@forter.com web: www.forter.com >> mobile: 052-5832253 >> >