*Disclaimer: Wanted to insert the email from Jose, so it is not lost from the email thread. By mistake it wasn't correctly answered.*
Hi all, At Datadog, we have started to leverage a blue/green architecture for Flink jobs in production that is similar to what has been discussed so far. It uses a FlinkBlueGreenDeployment CR and a custom k8s operator that creates the underlying FlinkDeployment CRs. For coordination between the operator and jobs, we use event time watermarks, Zookeeper (since we use this for HA) and broadcast process functions. A SinkV2 based API allows Flink jobs to wire into the blue/green topology. The new deployment can be validated externally before it can write data visible to downstream consumers (internal teams use Datadog monitors). Sergio, we'd love to share what we have learned. -- Jose Vargas (he/him) Software Engineer II, Standalone Metering DATADOG 370 17th Street 33rd Floor, Denver, CO 80202 USA <https://maps.google.com/?q=370%2017th%20Street%2033rd%20Floor,%20Denver,%20CO%2080202%20USA> Twitter | <https://twitter.com/datadoghq/> Instagram | <https://www.instagram.com/datadoghq/> YouTube | <https://www.youtube.com/user/DatadogHQ/> LinkedIn | <https://www.linkedin.com/company/datadog/> We're Hiring <https://www.datadoghq.com/careers/> On Fri, Dec 6, 2024 at 5:55 PM Alex Nitavsky <alexnitav...@gmail.com> wrote: > Hi Sergio, > Thanks for the proposal. > > There are certainly many corner cases to address when switching from a > Green to Blue deployment to keep data active. Some points that I believe > haven't been listed yet include: > > - Should this reusable ProcessFunction filter out data or simply tag > them? For example, filtering data for green deployments would be suitable > in the case of Async IO outputs. > - Should we also modify the Source API to allow it to provide COLOR to > the Source functions? This could be used, for instance, to generate a > different groupId for Kafka. > > > Additionally, have you considered including a validation mechanism for > Blue-Green deployment? For example, allowing users to perform automatic > validation before promoting Green to Blue. After Green is deployed, we > could wait for an external ConfigMap change or a property change in the > FlinkBlueGreenDeployment CRD to start the promotion. > > Regards > Oleksandr > > > On Thu, Dec 5, 2024 at 8:32 PM Sergio Chong Loo > <schong...@apple.com.invalid> wrote: > >> Thanks everyone for your input and encouragement! >> >> Yes all of these are questions that have popped up one way or another, it >> won’t be easy to generalize this solution but hopefully we can devise a >> good initial/extensible architecture. >> >> I’ll get to work on the FLIP and make sure all these points are included. >> >> - Sergio >> >> > On Dec 5, 2024, at 5:34 AM, Alexander Fedulov < >> alexander.fedu...@gmail.com> wrote: >> > >> > Hi Sergio, >> > >> > Thank you for starting the discussion. Blue/Green deployment support >> would >> > indeed be a great feature. >> > +1 on initiating the FLIP. >> > >> > In addition to David's points about data consistency, I think it is also >> > important to mention in the FLIP that this deployment procedure will >> > probably not be compatible with changes in parallelism between blue and >> > green. Such changes could potentially lead to differences in watermark >> > propagation and discrepancies in the produced data between the two jobs. >> > >> > Best, >> > Alex >> > >> > On Wed, 4 Dec 2024 at 12:36, David Radley <david_rad...@uk.ibm.com> >> wrote: >> > >> >> Hi Sergio, >> >> +1 for starting a FLIP. >> >> I am wondering how non zero values for table.exec.source.idle-timeout >> and >> >> table.exec.state.ttl side effects, as they are based on clock time. It >> will >> >> be interesting to identify these less stable scenarios and see what we >> can >> >> do with them. >> >> >> >> Kind regards, David. >> >> >> >> >> >> >> >> >> >> From: Maximilian Michels <m...@apache.org> >> >> Date: Wednesday, 4 December 2024 at 09:56 >> >> To: schong...@apple.com.invalid <schong...@apple.com.invalid> >> >> Cc: dev@flink.apache.org <dev@flink.apache.org> >> >> Subject: [EXTERNAL] Re: Blue/Green Deployments support for Flink >> >> Hi Sergio, >> >> >> >> Out of the box blue/green deployments would be a great addition to >> Flink. >> >> >> >> +1 for starting a FLIP. That will allow us to better describe the >> >> architecture and flesh out the technical details. I reckon the handover >> >> between the two pipelines is going to be the most difficult part. >> >> Particularly, how to ensure that both pipelines are in sync with >> respect to >> >> the point of handover. >> >> >> >> -Max >> >> >> >> On Tue, Dec 3, 2024 at 11:32 PM Sergio Chong Loo >> >> <schong...@apple.com.invalid> wrote: >> >> >> >>> Hi Danny… and community, >> >>> >> >>> At this point would it make sense for me to begin putting a FLIP >> together >> >>> with more details and perhaps continue the conversation from there? >> >>> >> >>> Thanks, >> >>> Sergio >> >>> >> >>>> On Nov 30, 2024, at 7:29 AM, Sergio Chong Loo <schong...@apple.com> >> >>> wrote: >> >>>> >> >>>> Hey Danny >> >>>> >> >>>> Thanks for digging deeper into this topic! >> >>>> >> >>>> Indeed we’ve been giving a thought to most of these points but you’ve >> >>> raised a couple of interesting ones, here are some ideas (inline): >> >>>> >> >>>> >> >>>>> On Nov 27, 2024, at 4:00 AM, Danny Cranmer <dannycran...@apache.org >> > >> >>> wrote: >> >>>>> >> >>>>> Hello Sergio, >> >>>>> >> >>>>> Thankyou for starting this discussion, I have a few questions. >> >>>>> >> >>>>>> having 2 identical pipelines running side-by-side >> >>>>> How do you ensure correctness between the 2 identical pipelines? For >> >>> example, processing time semantics or late data can result in >> different >> >>> outputs. >> >>>>> Some Sources have consumption quotas, for example Kinesis Data >> >> Streams, >> >>> this may end up eating into this quota and cause problems. >> >>>>> How do we handle sources like Queues when they cannot be consumed >> >>> concurrently? >> >>>> >> >>>> Yes for this analysis we’ve been staying away from Processing Time >> >> since >> >>> it’s (even for a single pipeline) to even “replay” idempotently. The >> most >> >>> stable scenario so far has been Event Time with Watermarks (the >> solution >> >>> will be extensible to accommodate other non-Watermark scenarios). >> >>>> >> >>>> The Blue deployment should start from Green's most recent Checkpoint >> >>> that way we minimize the amount of time it needs to “catch up”, with >> >> Event >> >>> Time is easier to ensure that catch up portion will be replayed >> virtually >> >>> the same way as Green’s. >> >>>> >> >>>> Since start up times, checkpointing intervals and overall the nature >> of >> >>> the data are strictly Business specific, the user should have a >> >>> configurable way (in this case via a “future” Watermark value) to >> >> indicate >> >>> when the transition between blue/green will occur. In other words, >> both >> >>> pipelines have the same configured “future” Watermark value to >> >> transition, >> >>> both pipelines “see” the exact same events/records, therefore the >> >>> Blue/Green Gates can both start/stop the record emission as soon as >> the >> >>> configured Watermark is reached… they’re mutually exclusive so the >> >> records >> >>> should pass through one gate or another. >> >>>> >> >>>> I don’t have experience yet with sources that cannot be consumed >> >>> concurrently, this will be a good one to analyze. >> >>>> >> >>>>> >> >>>>>> we explore the idea of empowering the pipeline to decide, at the >> >>> record level, what data goes through and what doesn’t (by means of a >> >> “Gate” >> >>> component). >> >>>>> What happens if the Green job gets ahead of the Blue job? How will >> you >> >>> pick a stopping point (per channel) to stop at consistently >> (checkpoint >> >>> barriers solve this already, right?). If you use Point in (event) >> Time, >> >> you >> >>> need to handle idle channels. >> >>>> >> >>>> Hopefully the answer above addresses this, if not I’m happy to add >> more >> >>> clarification. >> >>>> >> >>>>> >> >>>>>> This new Controller will work in conjunction with a custom Flink >> >>> Process Function >> >>>>> Would this rely on the user including the operator in the job graph, >> >> or >> >>> would you somehow dynamically inject it? >> >>>>> Where would you put this operator in complex Job graphs or forests? >> >>>> >> >>>> Yes, initially an idea is to have a reusable ProcessFunction that the >> >>> user can place in their pipelines wherever they see fit. So far it >> seems >> >>> like a good idea to place it towards the end of the pipeline, e.g. >> >> before a >> >>> sink, that way the majority of the state of that job can be exploited >> and >> >>> preserved until we absolutely know we can tear the old Green pipeline >> >> down… >> >>> or rollback. >> >>>> >> >>>> Another idea, but it would be more intrusive and harder to test in an >> >>> initial iteration, is to add this as base Sink functionality; this way >> >> the >> >>> sink could know whether to write the record(s) or not. >> >>>> >> >>>>> >> >>>>>> A ConfigMap will be used to both contain the deployment parameters >> >> as >> >>> well as act as a communication conduit between the Controller and the >> two >> >>> Jobs. >> >>>>> Will a conflig map write/read cycle be fast enough? If you write >> into >> >>> the CM "stop at record x" then the other Job might read past record x >> >>> before it sees the command >> >>>> >> >>>> Oh for sure this can happen. That’s why the user should have the >> >>> capability of defining the custom “future” cutover point, because the >> >> time >> >>> span their pipeline needs will be entirely business specific. For >> safety, >> >>> our ProcessFunction could even enforce a minimum default “time to >> wait” >> >> or >> >>> “minimum Watermark transition value” from the moment the pipeline >> started >> >>> processing records or something similar. >> >>>> >> >>>>> >> >>>>>> This function will listen for changes to the above mentioned >> >>> ConfigMap and each job will know if it’s meant to be Active or >> Standby, >> >> at >> >>> record level. >> >>>>> How will this work with exactly once sinks when you might have open >> >>> transactions? >> >>>> >> >>>> The same principle remains, records should be written to either >> Blue’s >> >>> transaction or to Green’s transaction. >> >>>> >> >>>> Another way to illustrate these components is: >> >>>> >> >>>> - Once a transition is initiated, Green’s Gate PF becomes >> StandBy >> >>> and Blue’s Gate PF becomes Active, however they gradually open or >> close >> >>> (mutually exclusive) in this fashion: >> >>>> >> >>>> - Green’s Gate will taper off the emission of records as >> >>> soon as each subtask’s Watermark crosses the configured transition >> value >> >>>> - Blue’s Gate will gradually start emitting records as >> >>> soon as each subtask’s Watermark crosses the configured transition >> value >> >>>> >> >>>>> >> >>>>> Thanks, >> >>>>> Danny >> >>>>> >> >>>>> On Mon, Nov 25, 2024 at 6:25 PM Sergio Chong Loo >> >>> <schong...@apple.com.invalid> wrote: >> >>>>>> >> >>>>>> Hi, >> >>>>>> >> >>>>>> As part of our work with Flink, we identified the need for a >> solution >> >>> to have minimal “downtime” when re-deploying pipelines. This is >> >> especially >> >>> the case when the startup times are high and lead to downtime that’s >> >>> unacceptable for certain workloads. >> >>>>>> >> >>>>>> One solution we thought about is the notion of Blue/Green >> deployments >> >>> which I conceptually describe in this email along with a potential >> list >> >> of >> >>> modifications, the majority within the Flink K8s Operator. >> >>>>>> >> >>>>>> This is open for discussion, we’d love to get feedback from the >> >>> community. >> >>>>>> >> >>>>>> Thanks! >> >>>>>> Sergio Chong >> >>>>>> >> >>>>>> >> >>>>>> Proposal to add Blue/Green Deployments support to the Flink K8s >> >>> Operator >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> Problem Statement >> >>>>>> >> >>>>>> As stateful applications, Flink pipelines require the data stream >> >> flow >> >>> to be halted during deployment operations. For some cases stopping >> this >> >>> data flow could have a negative impact on downstream consumers. The >> >>> following figure helps illustrate this: >> >>>>>> >> >>>>>> A Blue/Green deployment architecture, or Active/StandBy, can help >> >>> overcome this issue by having 2 identical pipelines running >> side-by-side >> >>> with only one (the Active) producing or outputting records at all >> times. >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> Moreover, since the pipelines are the ones “handling” each and >> every >> >>> record, we explore the idea of empowering the pipeline to decide, at >> the >> >>> record level, what data goes through and what doesn’t (by means of a >> >> “Gate” >> >>> component). >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> Proposed Modifications >> >>>>>> >> >>>>>> Give the the Flink K8s Operator the capability of managing the >> >>> lifecycle of these deployments. >> >>>>>> A new CRD (e.g. FlinkBlueGreenDeployment) will be introduced >> >>>>>> A new Controller (e.g. FlinkBlueGreenDeploymentController) to >> manage >> >>> this CRD and hide from the user the details of the actual Blue/Green >> >>> (Active/StandBy) jobs. >> >>>>>> Delegate the lifecycle of the actual Jobs to the existing >> >>> FlinkDeployment controller. >> >>>>>> At all times the controller “knows” which deployment type is/will >> >>> become Active, from which point in time. This implies an Event >> Watermark >> >>> driven implementation but with an extensible architecture to easily >> >> support >> >>> other use cases. >> >>>>>> A ConfigMap will be used to both contain the deployment parameters >> as >> >>> well as act as a communication conduit between the Controller and the >> two >> >>> Jobs. >> >>>>>> This new Controller will work in conjunction with a custom Flink >> >>> Process Function (potentially called GateProcessFunction) on the >> >>> pipeline/client side. This function will listen for changes to the >> above >> >>> mentioned ConfigMap and each job will know if it’s meant to be Active >> or >> >>> Standby, at record level. >> >>>>>> >> >>>>>> >> >>>>>> >> >>>> >> >>> >> >>> >> >> >> >> Unless otherwise stated above: >> >> >> >> IBM United Kingdom Limited >> >> Registered in England and Wales with number 741598 >> >> Registered office: Building C, IBM Hursley Office, Hursley Park Road, >> >> Winchester, Hampshire SO21 2JN >> >> >> >>