Hi Sergio,
Thanks for starting this conversation.

A few thoughts regarding BlueGreen Phase 2:
1. The Gate Operator is interesting but I don't like that we would have to
modify users' pipelines for them to use Phase 2. This gate function seems
like it could be a Mixin that connectors would implement. If you want to
use Phase 2, your sinks must implement this Mixin. I understand that a
unique GateFunction has pros, but it works less well with FlinkSQL - and
the trade-off doesn't seem worthwhile.
2. Regarding the ConfigMap. We should consider a solution that supports
migrating Flink jobs between Kubernetes clusters. Otherwise Phase 2 is only
useful for in cluster operations.
3. Watermarking is a requirement. Will the Flink Kubernetes Operator
validate that the pipeline is using watermarks?

>What happens when idleness is configured? Watermarks will get ignored from
these “slow” subtasks and advance, could records from the ignored subtasks
eventually be lost?
Yes they would be lost, but that would happen irrespective of Phase 2.

I'll have more thoughts after we discuss the Gate Operator, as that is
crucial to the FLIP right now.

Ryan van Huuksloot
Staff Engineer, Infrastructure | Streaming Platform
[image: Shopify]
<https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email>


On Mon, Mar 2, 2026 at 6:52 PM Sergio Chong Loo <[email protected]> wrote:

> Bumping this (Advanced Blue/Green deployments - FLIP-504) thread after
> making some code adjustments.
>
> FYI @drossos <https://github.com/drossos> @ryanvanhuuksloot <
> https://github.com/ryanvanhuuksloot> I’d like to get your feedback since
> I know you’re interested in this feature.
>
> Thanks,
> - Sergio
>
>
> > On Dec 5, 2025, at 2:31 PM, Sergio Chong Loo <[email protected]>
> wrote:
> >
> > Hi folks,
> >
> > FLIP-503 (already merged) introduced the Basic Blue/Green Deployment
> functionality to the Flink K8s Operator. It was very straightforward,
> simply transitioning to the second deployment once it's considered stable.
> >
> > FLIP-504 is an Advanced version added on top of 503 and brings about the
> notion of "record-level" coordination between the 2 deployments to have no
> data duplication and exactly once semantics while preserving a smooth
> transition.
> >
> > The main goals are:
> >     • For the community to take a quick look at the current
> functionality (previously mentioned at the Flink Forward 2025 Conference)
> >     • To get feedback and improvement suggestions
> >
> > Flip 504 details:
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=337677650
> >
> > Draft PR: https://github.com/apache/flink-kubernetes-operator/pull/1043
> >
> > Thank you!
> > - Sergio
> >
> >
>
>

Reply via email to