Hey Kevin!

I am not aware of anyone currently working on this for the Flink Operator.

Here are my current thoughts on the topic:

   1. It's not impossible to build this into the operator but it would
   require some considerable changes to the logic, both in terms of resource
   mapping and observer logic, however...
   2. It's a very niche use-case and in most cases this is not required
   3. Even if we implement it there are a lot of caveats for making this
   generally useful outside of some very specialized use-cases
   4. In most cases this is actually not a good way to perform upgrades and
   depending on the application it may lead to incorrect results etc.
   5. This is possible to build on top of the current operator logic
   externally

So at the moment I am slightly against the idea in general, but of course I
can be convinced otherwise if there is a general requirement / interest in
the community. In any case we should have confidence that this will
actually provide production value to many use-cases and it would require a
FLIP for sure.

Cheers,
Gyula



On Wed, May 24, 2023 at 5:24 PM Kevin Lam <kevin....@shopify.com.invalid>
wrote:

> Hi,
>
> Is there any interest or ongoing work around supporting zero-downtime
> deployments with Flink using the Flink Operator?
>
> I saw that https://issues.apache.org/jira/browse/FLINK-24257 existed, but
> it looks a little stale. I'm interested in learning more about the current
> state of things.
>
> There is also some pre-existing work done by Lyft:
> https://www.youtube.com/watch?v=Hyt3YrtKQAM
>

Reply via email to