Hey Danny

Thanks for digging deeper into this topic!

Indeed we’ve been giving a thought to most of these points but you’ve raised a 
couple of interesting ones, here are some ideas (inline):


> On Nov 27, 2024, at 4:00 AM, Danny Cranmer <dannycran...@apache.org> wrote:
> 
> Hello Sergio,
> 
> Thankyou for starting this discussion, I have a few questions.
> 
> > having 2 identical pipelines running side-by-side
> How do you ensure correctness between the 2 identical pipelines? For example, 
> processing time semantics or late data can result in different outputs.
> Some Sources have consumption quotas, for example Kinesis Data Streams, this 
> may end up eating into this quota and cause problems.
> How do we handle sources like Queues when they cannot be consumed 
> concurrently?

Yes for this analysis we’ve been staying away from Processing Time since it’s 
(even for a single pipeline) to even “replay” idempotently. The most stable 
scenario so far has been Event Time with Watermarks (the solution will be 
extensible to accommodate other non-Watermark scenarios).

The Blue deployment should start from Green's most recent Checkpoint that way 
we minimize the amount of time it needs to “catch up”, with Event Time is 
easier to ensure that catch up portion will be replayed virtually the same way 
as Green’s.

Since start up times, checkpointing intervals and overall the nature of the 
data are strictly Business specific, the user should have a configurable way 
(in this case via a “future” Watermark value) to indicate when the transition 
between blue/green will occur. In other words, both pipelines have the same 
configured “future” Watermark value to transition, both pipelines “see” the 
exact same events/records, therefore the Blue/Green Gates can both start/stop 
the record emission as soon as the configured Watermark is reached… they’re 
mutually exclusive so the records should pass through one gate or another.

I don’t have experience yet with sources that cannot be consumed concurrently, 
this will be a good one to analyze.

> 
> > we explore the idea of empowering the pipeline to decide, at the record 
> > level, what data goes through and what doesn’t (by means of a “Gate” 
> > component).
> What happens if the Green job gets ahead of the Blue job? How will you pick a 
> stopping point (per channel) to stop at consistently (checkpoint barriers 
> solve this already, right?). If you use Point in (event) Time, you need to 
> handle idle channels.

Hopefully the answer above addresses this, if not I’m happy to add more 
clarification.

> 
> > This new Controller will work in conjunction with a custom Flink Process 
> > Function
> Would this rely on the user including the operator in the job graph, or would 
> you somehow dynamically inject it?
> Where would you put this operator in complex Job graphs or forests? 

Yes, initially an idea is to have a reusable ProcessFunction that the user can 
place in their pipelines wherever they see fit. So far it seems like a good 
idea to place it towards the end of the pipeline, e.g. before a sink, that way 
the majority of the state of that job can be exploited and preserved until we 
absolutely know we can tear the old Green pipeline down… or rollback.

Another idea, but it would be more intrusive and harder to test in an initial 
iteration, is to add this as base Sink functionality; this way the sink could 
know whether to write the record(s) or not.

> 
> > A ConfigMap will be used to both contain the deployment parameters as well 
> > as act as a communication conduit between the Controller and the two Jobs.
> Will a conflig map write/read cycle be fast enough? If you write into the CM 
> "stop at record x" then the other Job might read past record x before it sees 
> the command

Oh for sure this can happen. That’s why the user should have the capability of 
defining the custom “future” cutover point, because the time span their 
pipeline needs will be entirely business specific. For safety, our 
ProcessFunction could even enforce a minimum default “time to wait” or “minimum 
Watermark transition value” from the moment the pipeline started processing 
records or something similar.

> 
> > This function will listen for changes to the above mentioned ConfigMap and 
> > each job will know if it’s meant to be Active or Standby, at record level.
> How will this work with exactly once sinks when you might have open 
> transactions?

The same principle remains, records should be written to either Blue’s 
transaction or to Green’s transaction. 

Another way to illustrate these components is:

        - Once a transition is initiated, Green’s Gate PF becomes StandBy and 
Blue’s Gate PF becomes Active, however they gradually open or close (mutually 
exclusive) in this fashion:

                - Green’s Gate will taper off the emission of records as soon 
as each subtask’s Watermark crosses the configured transition value
                - Blue’s Gate will gradually start emitting records as soon as 
each  subtask’s Watermark crosses the configured transition value

> 
> Thanks,
> Danny
> 
> On Mon, Nov 25, 2024 at 6:25 PM Sergio Chong Loo 
> <schong...@apple.com.invalid> wrote:
>> 
>> Hi,
>> 
>> As part of our work with Flink, we identified the need for a solution to 
>> have minimal “downtime” when re-deploying pipelines. This is especially the 
>> case when the startup times are high and lead to downtime that’s 
>> unacceptable for certain workloads.
>> 
>> One solution we thought about is the notion of Blue/Green deployments which 
>> I conceptually describe in this email along with a potential list of 
>> modifications, the majority within the Flink K8s Operator.
>> 
>> This is open for discussion, we’d love to get feedback from the community.
>> 
>> Thanks!
>> Sergio Chong
>> 
>> 
>> Proposal to add Blue/Green Deployments support to the Flink K8s Operator
>> 
>> 
>> 
>> Problem Statement
>> 
>> As stateful applications, Flink pipelines require the data stream flow to be 
>> halted during deployment operations. For some cases stopping this data flow 
>> could have a negative impact on downstream consumers. The following figure 
>> helps illustrate this:
>> 
>> A Blue/Green deployment architecture, or Active/StandBy, can help overcome 
>> this issue by having 2 identical pipelines running side-by-side with only 
>> one (the Active) producing or outputting records at all times.
>> 
>> 
>> 
>> Moreover, since the pipelines are the ones “handling” each and every record, 
>> we explore the idea of empowering the pipeline to decide, at the record 
>> level, what data goes through and what doesn’t (by means of a “Gate” 
>> component).
>> 
>> 
>> 
>> 
>> Proposed Modifications
>> 
>> Give the the Flink K8s Operator the capability of managing the lifecycle of 
>> these deployments.
>> A new CRD (e.g. FlinkBlueGreenDeployment) will be introduced
>> A new Controller (e.g. FlinkBlueGreenDeploymentController) to manage this 
>> CRD and hide from the user the details of the actual Blue/Green 
>> (Active/StandBy) jobs.
>> Delegate the lifecycle of the actual Jobs to the existing FlinkDeployment 
>> controller.
>> At all times the controller “knows” which deployment type is/will become 
>> Active, from which point in time. This implies an Event Watermark driven 
>> implementation but with an extensible architecture to easily support other 
>> use cases.
>> A ConfigMap will be used to both contain the deployment parameters as well 
>> as act as a communication conduit between the Controller and the two Jobs.
>> This new Controller will work in conjunction with a custom Flink Process 
>> Function (potentially called GateProcessFunction) on the pipeline/client 
>> side. This function will listen for changes to the above mentioned ConfigMap 
>> and each job will know if it’s meant to be Active or Standby, at record 
>> level.
>> 
>> 
>> 

Reply via email to