Back in 2020, there was a Flink Forward talk [1] about how Lyft was doing blue green deployments. Earlier (all the way back in 2017) Drivetribe described [2] how they were doing so as well.
David [1] https://www.youtube.com/watch?v=Hyt3YrtKQAM [2] https://www.ververica.com/blog/drivetribe-cqrs-apache-flink On Thu, Aug 31, 2023 at 1:21 AM Nicolas Fraison via user <user@flink.apache.org> wrote: > > Definitely our intent is to start with an in house specific Blue Green > operator and once we will reach some level of confidence we will open a FLIP > to discuss it. > > Nicolas > > On Thu, Aug 31, 2023 at 10:12 AM Gyula Fóra <gyula.f...@gmail.com> wrote: >> >> The main concern as we discussed in previous mailing list threads before is >> the general applicability of such solution: >> >> - Many production jobs cannot really afford running in parallel (starting >> the second job while the first one is running), due to data >> consistency/duplications reasons >> - Exactly once sinks do not really support this >> >> So I think we should start with this maybe as an independent effort / >> external library and if we see that it works we could discuss it in a FLIP. >> >> What do you think? >> Gyula >> >> On Thu, Aug 31, 2023 at 9:23 AM Nicolas Fraison >> <nicolas.frai...@datadoghq.com> wrote: >>> >>> Thanks Gyula for your feedback. >>> >>> We were also thinking of relying on such a solution, creating a dedicated >>> crd/operator to manage this BlueGreenFlinkDeployment. >>> Good to hear that it could be incorporated later in the operator. >>> >>> Will let you know once we will have something to share with you. >>> >>> Nicolas >>> >>> On Wed, Aug 30, 2023 at 4:28 PM Gyula Fóra <gyula.f...@gmail.com> wrote: >>>> >>>> Hey! >>>> >>>> I don't know if anyone has implemented this or not but one way to approach >>>> this problem (and this may not be the right way, just an idea :) ) is to >>>> add a new Custom Resource type that sits on top of the FlinkDeployment / >>>> FlinkSessionJob resources and add a small controller for this. >>>> >>>> This new custom resource, BlueGreenDeployment, would be somewhat similar >>>> to how a Replicaset vs Pod works in Kubernetes. It would create a new >>>> FlinkDeployment and would delete the old one once the new reached a >>>> healthy running state. >>>> >>>> Adding a new CR allows us to not overcomplicate the existing >>>> resource/controller loop but simply leverage it. If you prototype >>>> something along these lines, please feel free to share and then we can >>>> discuss if we want to incorporate something like this in the operator repo >>>> in the future :) >>>> >>>> Cheers, >>>> Gyula >>>> >>>> On Wed, Aug 30, 2023 at 1:21 PM Nicolas Fraison via user >>>> <user@flink.apache.org> wrote: >>>>> >>>>> Hi, >>>>> >>>>> From https://issues.apache.org/jira/browse/FLINK-29199 it seems that >>>>> support for blue green deployment will not be supported or will not >>>>> happen soon. >>>>> >>>>> I'd like to know if some of you have built a custom mechanism on top of >>>>> this operator to support the blue green deployment and if you would have >>>>> any advice on implementing this? >>>>> >>>>> -- >>>>> >>>>> Nicolas Fraison (he/him)