Awesome! Thanks for the update, Sergio. I'm excited to see the plan - it is a cool idea so I'm glad it is working.
Ryan van Huuksloot Staff Engineer, Infrastructure | Streaming Platform [image: Shopify] <https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email> On Thu, Mar 26, 2026 at 1:46 AM Sergio Chong Loo <[email protected]> wrote: > @Ryan / @Daniel, > > Good news! The “GateInjectorPipelineExecutor” idea is successful!! > > While the original approach of simply activating it with > “execution.target” did not quite work, I was able to implement it via the > *Instrumentation > API with a Java Agent* that injects it… the user doesn’t have to touch > their pipelines and I added 2 options, at least for now, to place/inject > the Gate after the source or before the sink (complex DAG cases with > multiple sources or sinks for now are not supported). > > I’m documenting everything and prepping the Draft PR for your review, > probably a couple more days. > > Thanks, stay tuned. > > - Sergio > > > On Mar 16, 2026, at 3:30 PM, Sergio Chong Loo <[email protected]> wrote: > > Thanks for the ideas and the offer to help out Ryan! It’s invaluable to > learn about how other users/teams scenarios. > > Indeed I have to pursue and evaluate the GateInjectorExecutor nonetheless > for our internal development. Ideally it’d be great if the user can simply > “invoke” the functionality, even give the user an option to specify “where" > the gating mechanism to be placed (e.g. right after the source or before > sink), or for the most flexibility they can incorporate and place the Gate > manually just like it is now. > > I’ll share the progress asap and we can all take it from there (this > should not exceed a couple weeks). I’ll definitely need more of your > feedback to verify this with Flink SQL. > > Thanks again, > Sergio > > > On Mar 16, 2026, at 7:01 AM, Ryan van Huuksloot < > [email protected]> wrote: > > Hi Sergio, > > re: 1.1 > My thought is that a BlueGreen Mixin isn't Kubernetes specific and could > be reused by other deployment control planes. However, I do agree that > attaching it to the sink has other implications so I am happy to pivot if > we can find an alternative solution. > > re: 2 > I'm happy to leave it out of the Phase 2 implementation, but I think it > should be possible. For example we use Phase 1 with cross cluster > migrations today. Phase 2 within a single cluster isn't particularly useful > for us. > > re: GateInjectorExecutor > This sounds like a neat idea. I need to read more about how it would work > but from a high level, injecting an operator before your sinks sounds like > a good idea. Better isolation, possible with SQL, no mixins, etc. > > I will mention that part of the reason I want it before the sinks is > because nine out of ten people building pipelines struggle to understand > where their state is and how Phase 2 would affect the correctness of their > state depending on where they put the gate. I understand that if you have a > remote lookup and want to save bandwidth, you could optimize your pipeline > by moving the gate before the remote call; however, that seems like an > optimization that can be made later. > > Thanks for driving this! Let me know how we can help. > > Ryan van Huuksloot > Staff Engineer, Infrastructure | Streaming Platform > [image: Shopify] > <https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email> > > > On Mon, Mar 16, 2026 at 2:21 AM Sergio Chong Loo <[email protected]> > wrote: > >> Hi Ryan >> >> Thanks a lot for these details. For sure some of these observations >> popped up during our initial discussions, and that’s why our initial goal >> was to introduce this as simple as possible and gradually enhance it to >> cover gaps. >> >> Allow me to address your concerns: >> >> 1. I’m happy you stressed the point of “disruption to existing >> pipelines”. However, there’s a few points about attempting to build this >> functionality into the sinks (or sources) right off the bat (read further >> below for my alternative): >> 1. Kubernetes centric: as of now the Blue/Green Deployments >> support is a Kubernetes specific solution, adding a mixin directly >> available to sinks would “leak” this support outside of K8s >> 2. A sink being aware of these deployment phases violates single >> responsibility, but more importantly… >> 3. Flink currently has many connectors, with the majority being >> maintained outside of the Flink code base, by separate teams, separate >> repos, separate release cycles. This would complicate things >> significantly >> as to try and add support for this for every potential flink connector >> project out there would be a cumbersome. Blue/Green Phase 2 then only >> would >> works with "gate-aware" sinks. >> 2. I’d leave the conversation about migrating jobs between K8s >> clusters outside of this scope, even Phase 1 is meant to only work in a >> single cluster… >> 3. Watermarking, excellent point, it’s indeed a requirement so I’ll >> make sure this is validated where applicable (by the concrete >> implementation) >> >> >> Having said what I said about point 1.1 above, I’m currently working on >> an approach which uses a “GateInjectorPipelineExecutor” so to speak; in >> other words a custom PipelineExecutor that would be shipped with the K8s >> Operator, invoked by Flink Configuration (via “execution.target:”). This >> custom piece would instantiate and inject the Gate at a fixed point in the >> StreamGraph right before job submission. I still have to validate and >> ensure a few things are correctly taken care of (like Type Information, >> etc.) but the theory looks promising. >> >> For the most part this works well with Flink SQL (same configuration), >> here’s my estimation: >> >> tEnv.executeSql("INSERT INTO my_sink ...") >> └─> SQL planner → ExecNodeGraph → Transformation[] >> └─> StreamGraph >> └─> GateInjectorExecutor injects GateProcessFunction >> └─> StreamGraph' (mutated) → JobGraph >> └─> Submit Job >> >> I’m aiming to share some updates along these lines in the next few weeks >> but hopefully this falls inline with your objectives/thoughts overall. >> >> Sergio >> >> >> On Mar 6, 2026, at 3:36 PM, Ryan van Huuksloot via dev < >> [email protected]> wrote: >> >> Hi Sergio, >> Thanks for starting this conversation. >> >> A few thoughts regarding BlueGreen Phase 2: >> 1. The Gate Operator is interesting but I don't like that we would have to >> modify users' pipelines for them to use Phase 2. This gate function seems >> like it could be a Mixin that connectors would implement. If you want to >> use Phase 2, your sinks must implement this Mixin. I understand that a >> unique GateFunction has pros, but it works less well with FlinkSQL - and >> the trade-off doesn't seem worthwhile. >> 2. Regarding the ConfigMap. We should consider a solution that supports >> migrating Flink jobs between Kubernetes clusters. Otherwise Phase 2 is >> only >> useful for in cluster operations. >> 3. Watermarking is a requirement. Will the Flink Kubernetes Operator >> validate that the pipeline is using watermarks? >> >> What happens when idleness is configured? Watermarks will get ignored from >> >> these “slow” subtasks and advance, could records from the ignored subtasks >> eventually be lost? >> Yes they would be lost, but that would happen irrespective of Phase 2. >> >> I'll have more thoughts after we discuss the Gate Operator, as that is >> crucial to the FLIP right now. >> >> Ryan van Huuksloot >> Staff Engineer, Infrastructure | Streaming Platform >> [image: Shopify] >> <https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email> >> >> >> On Mon, Mar 2, 2026 at 6:52 PM Sergio Chong Loo <[email protected]> >> wrote: >> >> Bumping this (Advanced Blue/Green deployments - FLIP-504) thread after >> making some code adjustments. >> >> FYI @drossos <https://github.com/drossos> @ryanvanhuuksloot < >> https://github.com/ryanvanhuuksloot> I’d like to get your feedback since >> I know you’re interested in this feature. >> >> Thanks, >> - Sergio >> >> >> On Dec 5, 2025, at 2:31 PM, Sergio Chong Loo <[email protected]> >> >> wrote: >> >> >> Hi folks, >> >> FLIP-503 (already merged) introduced the Basic Blue/Green Deployment >> >> functionality to the Flink K8s Operator. It was very straightforward, >> simply transitioning to the second deployment once it's considered stable. >> >> >> FLIP-504 is an Advanced version added on top of 503 and brings about the >> >> notion of "record-level" coordination between the 2 deployments to have no >> data duplication and exactly once semantics while preserving a smooth >> transition. >> >> >> The main goals are: >> • For the community to take a quick look at the current >> >> functionality (previously mentioned at the Flink Forward 2025 Conference) >> >> • To get feedback and improvement suggestions >> >> Flip 504 details: >> >> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=337677650 >> >> >> Draft PR: https://github.com/apache/flink-kubernetes-operator/pull/1043 >> >> Thank you! >> - Sergio >> >> >
