Hi Sergio,

With relation to the Kubernetes events. It would be great to tackle the
integration of ResourceQuota
<https://kubernetes.io/docs/concepts/policy/resource-quotas/>(s) into the
Kubernetes operator as part of this initiative. Then we would know we have
enough resources to double the resources and perform the upgrade.
We'd still need to handle cluster wide resources but from our experience
usually we run out of quota at the namespace much more frequently than
cluster wide.

Great FLIP though!
Ryan van Huuksloot
Sr. Production Engineer | Streaming Platform
[image: Shopify]
<https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email>


On Mon, Mar 24, 2025 at 12:51 PM Sergio Chong Loo
<schong...@apple.com.invalid> wrote:

> Hi Rui,
>
> Great question, yes, that’s been taken into account.
>
> During the transition we check whether the resources are ready and give it
> a timeout, we also monitor the Kubernetes events and keep track of anything
> abnormal. If the transition times out, it’s aborted, status is patched and
> we raise the error along with the details. The first job continues its
> normal processing and the second job is left untouched so it can be
> examined.
>
> Hope this answers your question
>
> Thanks,
> - Sergio
>
> > On Mar 20, 2025, at 7:02 PM, Rui Fan <1996fan...@gmail.com> wrote:
> >
> > Sorry for the late response.
> >
> > Thanks Sergio and Gyula for driving this proposal, it's really useful
> > for reducing the downtime when restarting or upgrading the job.
> >
> > I have a question for this FLIP:
> > As the Event Sequence for a Blue/Green part mentioned in the FLIP,
> > the deployment A will be deleted if B is running successfully.
> >
> > It means one job needs double resources during re-deploying, right?
> > If so, do we have any timeout mechanism if the resource is not enough?
> >
> > For example, the kubernetes cluster or namespace doesn't have
> > any extra resources for now. Generally, if old deployment A is deleted
> > first, then there are enough resources to start the new deployment B.
> >
> > If the deployment A is deleted if B is running successfully, and resource
> > is not enough for B. It means B cannot be running successfully, and
> > deployment A never stops. It's like a deadlock: A is waiting for B to
> run,
> > and B is waiting for A to release resources.
> >
> > Introducing the timeout mechanism for A means A will still stop if B is
> > not running within the timeout.
> >
> > Please correct me if my understanding is wrong, thanks~
> >
> > Best,
> > Rui
> >
> >
> > On Tue, Mar 11, 2025 at 10:01 PM Gyula Fóra <gyula.f...@gmail.com>
> wrote:
> >
> >> I think we should proceed with the vote :)
> >>
> >> Let me start the voting thread.
> >>
> >>
> >> On Tue, Mar 11, 2025 at 2:56 PM Sergio Chong Loo
> >> <schong...@apple.com.invalid> wrote:
> >>
> >>> @Gyula,
> >>>
> >>> Thanks for the input, I also second the “blue/green” naming convention;
> >>> and yes none of the colors is meant to have any meaning or purpose
> other
> >>> than distinction.
> >>>
> >>> @Alexis,
> >>>
> >>> Indeed, so far the proposal/doc suggests a FlinkBlueGreenDeployment CRD
> >>>
> >>> Sergio
> >>>
> >>>> On Mar 6, 2025, at 9:12 AM, Alexis Sarda-Espinosa <
> >>> sarda.espin...@gmail.com> wrote:
> >>>>
> >>>> Hi everyone,
> >>>>
> >>>> I had also thought about this kind of functionality in the past and
> I'm
> >>>> very interested to see how it works out. I had imagined something like
> >> a
> >>>> FlinkContinuousDeployment as CRD, just putting it out there.
> >>>>
> >>>> Regards,
> >>>> Alexis.
> >>>>
> >>>> On Thu, 6 Mar 2025, 17:31 Gyula Fóra, <gyula.f...@gmail.com> wrote:
> >>>>
> >>>>> Hi!
> >>>>>
> >>>>> I think we should consider either FlinkAbDeployment or
> >>>>> FlinkBlueGreenDeployment as a name and then label deployments and
> >> states
> >>>>> with a/b or blue/green accordingly.
> >>>>>
> >>>>> I have a slight preference for blue green as it sounds a bit nicer
> and
> >>> more
> >>>>> descriptive but it depends a bit whether the concept has any strong
> >>>>> relation with what should be the active one (does green always have
> to
> >>> be
> >>>>> the "new" one)?
> >>>>>
> >>>>> In any case I think the proposal is pretty clear and we should go
> >> ahead
> >>>>> with this if there are no more discussion points from the community
> :)
> >>>>>
> >>>>> I can start the vote on monday.
> >>>>>
> >>>>> Cheers,
> >>>>> Gyula
> >>>>>
> >>>>> On Tue, Feb 11, 2025 at 4:03 PM Sergio Chong Loo
> >>>>> <schong...@apple.com.invalid> wrote:
> >>>>>
> >>>>>> Hi Gyula,
> >>>>>>
> >>>>>> Great questions, I’ll track these topics in our docs accordingly as
> >>> well.
> >>>>>>
> >>>>>>> - What will be the naming convention for the created
> FlinkDeployment
> >>>>> A/B?
> >>>>>>> Should we introduce some logic for the users to control this?
> >>>>>>
> >>>>>>
> >>>>>> Currently, the controller takes the original resource name as the
> >> main
> >>>>>> prefix and adds the “-a” or “-b” suffixes (in an alternating
> fashion)
> >>> to
> >>>>>> distinguish them. We could switch this to a numeric pattern.
> >>>>>>
> >>>>>> We could indeed allow the user to have some control on the
> >> deployments’
> >>>>>> name prefixes or even the _type_ of suffixes. Thoughts?
> >>>>>>
> >>>>>>> - Can the user "turn" and existing FlinkDeployment into a Blue /
> >> Green
> >>>>>>> deployment?
> >>>>>>
> >>>>>> This is a very good idea, we could introduce a “flag” in the CRD
> that
> >>>>>> would instruct the controller to treat an existing FlinkDeployment
> as
> >>> an
> >>>>>> “-a” type and proceed redeploying it as a Blue/Green instead.
> >>>>>>
> >>>>>>> - Did you consider alternative names for this CR?
> >>>>>>
> >>>>>> This is one of the most open topics, some other ideas were
> >>>>>> “Active/Standby” or “Rolling Deployments”… “Blue/Green” simply stuck
> >> a
> >>>>> bit
> >>>>>> more. Any other suggestions?
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Sergio
> >>>>>>
> >>>>>>
> >>>>>>> On Feb 9, 2025, at 5:17 PM, Gyula Fóra <gyula.f...@gmail.com>
> >> wrote:
> >>>>>>>
> >>>>>>> Hi Sergio!
> >>>>>>>
> >>>>>>> I think this will be a great addition to the operator and is a
> >> feature
> >>>>>>> request that comes up again and again.
> >>>>>>>
> >>>>>>> Some minor comments/question:
> >>>>>>> - What will be the naming convention for the created
> FlinkDeployment
> >>>>> A/B?
> >>>>>>> Should we introduce some logic for the users to control this?
> >>>>>>> - Can the user "turn" and existing FlinkDeployment into a Blue /
> >> Green
> >>>>>>> deployment?
> >>>>>>> - Did you consider alternative names for this CR?
> >>>>>>>
> >>>>>>> Cheers,
> >>>>>>> Gyula
> >>>>>>>
> >>>>>>> On Fri, Jan 24, 2025 at 6:00 PM Gyula Fóra <gyula.f...@gmail.com>
> >>>>> wrote:
> >>>>>>>
> >>>>>>>> Hi Eric,
> >>>>>>>>
> >>>>>>>> The link is fixed and the FLIP contains everything from the google
> >>>>> doc,
> >>>>>> I
> >>>>>>>> updated the link there as well.
> >>>>>>>>
> >>>>>>>> Thanks
> >>>>>>>> Gyula
> >>>>>>>>
> >>>>>>>> On Fri, Jan 24, 2025 at 5:55 PM Eric Xiao <eric.x...@decodable.co
> >>>>>> .invalid>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hi Sergio,
> >>>>>>>>>
> >>>>>>>>> Can you update the Phase 1 Google Doc's sharing permissions? I
> >> also
> >>>>>>>>> believe
> >>>>>>>>> the link in the FLIP leads to an internal Apple tool:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>
> >>>>>
> >>>
> >>
> https://quip-apple.com/account/login?next=https%3A%2F%2Fquip-apple.com%2F7BpiAdeZ7Ow3
> >>>>>>>>>
> >>>>>>>>> On Tue, Jan 14, 2025 at 12:15 PM Sergio Chong Loo
> >>>>>>>>> <schong...@apple.com.invalid> wrote:
> >>>>>>>>>
> >>>>>>>>>> FLIP-503:
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>
> >>>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=337677648
> >>>>>>>>>>
> >>>>>>>>>> - Sergio
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> On Jan 13, 2025, at 2:39 PM, Sergio Chong Loo <
> >>> schong...@apple.com
> >>>>>>
> >>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> Hi folks,
> >>>>>>>>>>>
> >>>>>>>>>>> As proposed in [1] we would like to more formally continue the
> >>>>>>>>>> discussion to add Blue/Green deployments support to Flink via
> the
> >>>>>>>>>> Kubernetes Operator.
> >>>>>>>>>>>
> >>>>>>>>>>> For clarity and easier review experience we’ve separated this
> >>>>> effort
> >>>>>>>>>> into 2 phases:
> >>>>>>>>>>>
> >>>>>>>>>>> 1) Blue/Green Deployments for Flink on Kubernetes: Phase 1
> >> (basic)
> >>>>> -
> >>>>>>>>>> THIS FLIP
> >>>>>>>>>>>
> >>>>>>>>>>> 2) Blue/Green Deployments for Flink on Kubernetes: Phase 2
> (with
> >>>>>>>>>> Coordination) - in its corresponding FLIP/email, which will
> >> follow
> >>>>>>>>> shortly
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Phase 1 Google Doc:
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>
> >>>>>
> >>>
> >>
> https://docs.google.com/document/d/159I9kPmHkPMNoKp7iIgntMZjrGz5J2_svOfuaNvV5HA/edit?pli=1&tab=t.0
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks everyone in advance, we’re really excited to bring this
> >>>>>> feature
> >>>>>>>>>> to the community!
> >>>>>>>>>>>
> >>>>>>>>>>> - Sergio
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> [1]
> >>>>> https://lists.apache.org/thread/m2sqgz455fzlvp0h9kbs1zmc5gj2s162
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>
> >>>
> >>
>
>

Reply via email to