Re: [DISCUSS] FLIP-212: Introduce Flink Kubernetes Operator

Gyula Fóra Mon, 14 Feb 2022 00:14:43 -0800

Hi Peng Yuan!

The repo is already created:
https://github.com/apache/flink-kubernetes-operator


We will open the PR with the initial prototype later today, stay tuned in
this thread! :)

Cheers,
Gyula

On Mon, Feb 14, 2022 at 9:09 AM K Fred <[email protected]> wrote:

> Hi All,
>
> Has the project of flink-kubernetes-operator been created in github?
>
> Peng Yuan
>
> On Wed, Feb 9, 2022 at 1:23 AM Gyula Fóra <[email protected]> wrote:
>
> > I agree with flink-kubernetes-operator as the repo name :)
> > Don't have any better idea
> >
> > Gyula
> >
> > On Sat, Feb 5, 2022 at 2:41 AM Thomas Weise <[email protected]> wrote:
> >
> > > Hi,
> > >
> > > Thanks for the continued feedback and discussion. Looks like we are
> > > ready to start a VOTE, I will initiate it shortly.
> > >
> > > In parallel it would be good to find the repository name.
> > >
> > > My suggestion would be: flink-kubernetes-operator
> > >
> > > I thought "flink-operator" could be a bit misleading since the term
> > > operator already has a meaning in Flink.
> > >
> > > I also considered "flink-k8s-operator" but that would be almost
> > > identical to existing operator implementations and could lead to
> > > confusion in the future.
> > >
> > > Thoughts?
> > >
> > > Thanks,
> > > Thomas
> > >
> > >
> > >
> > > On Fri, Feb 4, 2022 at 5:15 AM Gyula Fóra <[email protected]>
> wrote:
> > > >
> > > > Hi Danny,
> > > >
> > > > So far we have been focusing our dev efforts on the initial native
> > > > implementation with the team.
> > > > If the discussion and vote goes well for this FLIP we are looking
> > forward
> > > > to contributing the initial version sometime next week (fingers
> > crossed).
> > > >
> > > > At that point I think we can already start the dev work to support
> the
> > > > standalone mode as well, especially if you can dedicate some effort
> to
> > > > pushing that side.
> > > > Working together on this sounds like a great idea and we should start
> > as
> > > > soon as possible! :)
> > > >
> > > > Cheers,
> > > > Gyula
> > > >
> > > > On Fri, Feb 4, 2022 at 2:07 PM Danny Cranmer <
> [email protected]>
> > > > wrote:
> > > >
> > > > > I have been discussing this one with my team. We are interested in
> > the
> > > > > Standalone mode, and are willing to contribute towards the
> > > implementation.
> > > > > Potentially we can work together to support both modes in parallel?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > On Wed, Feb 2, 2022 at 4:02 PM Gyula Fóra <[email protected]>
> > > wrote:
> > > > >
> > > > > > Hi Danny!
> > > > > >
> > > > > > Thanks for the feedback :)
> > > > > >
> > > > > > Versioning:
> > > > > > Versioning will be independent from Flink and the operator will
> > > depend
> > > > > on a
> > > > > > fixed flink version (in every given operator version).
> > > > > > This should be the exact same setup as with Stateful Functions (
> > > > > > https://github.com/apache/flink-statefun). So independent
> release
> > > cycle
> > > > > > but
> > > > > > still within the Flink umbrella.
> > > > > >
> > > > > > Deployment error handling:
> > > > > > I think that's a very good point, as general exception handling
> for
> > > the
> > > > > > different failure scenarios is a tricky problem. I think the
> > > exception
> > > > > > classifiers and retry strategies could avoid a lot of manual
> > > intervention
> > > > > > from the user. We will definitely need to add something like
> this.
> > > Once
> > > > > we
> > > > > > have the repo created with the initial operator code we should
> open
> > > some
> > > > > > tickets for this and put it on the short term roadmap!
> > > > > >
> > > > > > Cheers,
> > > > > > Gyula
> > > > > >
> > > > > > On Wed, Feb 2, 2022 at 4:50 PM Danny Cranmer <
> > > [email protected]>
> > > > > > wrote:
> > > > > >
> > > > > > > Hey team,
> > > > > > >
> > > > > > > Great work on the FLIP, I am looking forward to this one. I
> agree
> > > that
> > > > > we
> > > > > > > can move forward to the voting stage.
> > > > > > >
> > > > > > > I have general feedback around how we will handle job
> submission
> > > > > failure
> > > > > > > and retry. As discussed in the Rejected Alternatives section,
> we
> > > can
> > > > > use
> > > > > > > Java to handle job submission failures from the Flink client.
> It
> > > would
> > > > > be
> > > > > > > useful to have the ability to configure exception classifiers
> and
> > > retry
> > > > > > > strategy as part of operator configuration.
> > > > > > >
> > > > > > > Given this will be in a separate Github repository I am curious
> > how
> > > > > ther
> > > > > > > versioning strategy will work in relation to the Flink version?
> > Do
> > > we
> > > > > > have
> > > > > > > any other components with a similar setup I can look at? Will
> the
> > > > > > operator
> > > > > > > version track Flink or will it use its own versioning strategy
> > > with a
> > > > > > Flink
> > > > > > > version support matrix, or similar?
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Feb 1, 2022 at 2:33 PM Márton Balassi <
> > > > > [email protected]>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi team,
> > > > > > > >
> > > > > > > > Thank you for the great feedback, Thomas has updated the FLIP
> > > page
> > > > > > > > accordingly. If you are comfortable with the currently
> existing
> > > > > design
> > > > > > > and
> > > > > > > > depth in the FLIP [1] I suggest moving forward to the voting
> > > stage -
> > > > > > once
> > > > > > > > that reaches a positive conclusion it lets us create the
> > separate
> > > > > code
> > > > > > > > repository under the flink project for the operator.
> > > > > > > >
> > > > > > > > I encourage everyone to keep improving the details in the
> > > meantime,
> > > > > > > however
> > > > > > > > I believe given the existing design and the general sentiment
> > on
> > > this
> > > > > > > > thread that the most efficient path from here is starting the
> > > > > > > > implementation so that we can collectively iterate over it.
> > > > > > > >
> > > > > > > > [1]
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-212%3A+Introduce+Flink+Kubernetes+Operator
> > > > > > > >
> > > > > > > > On Mon, Jan 31, 2022 at 10:15 PM Thomas Weise <
> [email protected]>
> > > > > wrote:
> > > > > > > >
> > > > > > > > > HI Xintong,
> > > > > > > > >
> > > > > > > > > Thanks for the feedback and please see responses below -->
> > > > > > > > >
> > > > > > > > > On Fri, Jan 28, 2022 at 12:21 AM Xintong Song <
> > > > > [email protected]
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Thanks Thomas for drafting this FLIP, and everyone for
> the
> > > > > > > discussion.
> > > > > > > > > >
> > > > > > > > > > I also have a few questions and comments.
> > > > > > > > > >
> > > > > > > > > > ## Job Submission
> > > > > > > > > > Deploying a Flink session cluster via kubectl & CR and
> then
> > > > > > > submitting
> > > > > > > > > jobs
> > > > > > > > > > to the cluster via Flink cli / REST is probably the
> > approach
> > > that
> > > > > > > > > requires
> > > > > > > > > > the least effort. However, I'd like to point out 2
> > > weaknesses.
> > > > > > > > > > 1. A lot of users use Flink in perjob/application modes.
> > For
> > > > > these
> > > > > > > > users,
> > > > > > > > > > having to run the job in two steps (deploy the cluster,
> and
> > > > > submit
> > > > > > > the
> > > > > > > > > job)
> > > > > > > > > > is not that convenient.
> > > > > > > > > > 2. One of our motivations is being able to manage Flink
> > > > > > applications'
> > > > > > > > > > lifecycles with kubectl. Submitting jobs from cli sounds
> > not
> > > > > > aligned
> > > > > > > > with
> > > > > > > > > > this motivation.
> > > > > > > > > > I think it's probably worth it to support submitting jobs
> > via
> > > > > > > kubectl &
> > > > > > > > > CR
> > > > > > > > > > in the first version, both together with deploying the
> > > cluster
> > > > > like
> > > > > > > in
> > > > > > > > > > perjob/application mode and after deploying the cluster
> > like
> > > in
> > > > > > > session
> > > > > > > > > > mode.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > The intention is to support application management through
> > > operator
> > > > > > and
> > > > > > > > CR,
> > > > > > > > > which means there won't be any 2 step submission process,
> > > which as
> > > > > > you
> > > > > > > > > allude to would defeat the purpose of this project. The CR
> > > example
> > > > > > > shows
> > > > > > > > > the application part. Please note that the bare cluster
> > > support is
> > > > > an
> > > > > > > > > *additional* feature for scenarios that require external
> job
> > > > > > > management.
> > > > > > > > Is
> > > > > > > > > there anything on the FLIP page that creates a different
> > > > > impression?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > ## Versioning
> > > > > > > > > > Which Flink versions does the operator plan to support?
> > > > > > > > > > 1. Native K8s deployment was firstly introduced in Flink
> > 1.10
> > > > > > > > > > 2. Native K8s HA was introduced in Flink 1.12
> > > > > > > > > > 3. The Pod template support was introduced in Flink 1.13
> > > > > > > > > > 4. There was some changes to the Flink docker image
> > > entrypoint
> > > > > > script
> > > > > > > > in,
> > > > > > > > > > IIRC, Flink 1.13
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Great, thanks for providing this. It is important for the
> > > > > > compatibility
> > > > > > > > > going forward also. We are targeting Flink 1.14.x upwards.
> > > Before
> > > > > the
> > > > > > > > > operator is ready there will be another Flink release.
> Let's
> > > see if
> > > > > > > > anyone
> > > > > > > > > is interested in earlier versions?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > ## Compatibility
> > > > > > > > > > What kind of API compatibility we can commit to? It's
> > > probably
> > > > > fine
> > > > > > > to
> > > > > > > > > have
> > > > > > > > > > alpha / beta version APIs that allow incompatible future
> > > changes
> > > > > > for
> > > > > > > > the
> > > > > > > > > > first version. But eventually we would need to guarantee
> > > > > backwards
> > > > > > > > > > compatibility, so that an early version CR can work with
> a
> > > new
> > > > > > > version
> > > > > > > > > > operator.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Another great point and please let me include that on the
> > FLIP
> > > > > page.
> > > > > > > ;-)
> > > > > > > > >
> > > > > > > > > I think we should allow incompatible changes for the first
> > one
> > > or
> > > > > two
> > > > > > > > > versions, similar to how other major features have evolved
> > > > > recently,
> > > > > > > such
> > > > > > > > > as FLIP-27.
> > > > > > > > >
> > > > > > > > > Would be great to get broader feedback on this one.
> > > > > > > > >
> > > > > > > > > Cheers,
> > > > > > > > > Thomas
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Thank you~
> > > > > > > > > >
> > > > > > > > > > Xintong Song
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Fri, Jan 28, 2022 at 1:18 PM Thomas Weise <
> > [email protected]
> > > >
> > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Thanks for the feedback!
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > # 1 Flink Native vs Standalone integration
> > > > > > > > > > > > Maybe we should make this more clear in the FLIP but
> we
> > > > > agreed
> > > > > > to
> > > > > > > > do
> > > > > > > > > > the
> > > > > > > > > > > > first version of the operator based on the native
> > > > > integration.
> > > > > > > > > > > > While this clearly does not cover all use-cases and
> > > > > > requirements,
> > > > > > > > it
> > > > > > > > > > > seems
> > > > > > > > > > > > this would lead to a much smaller initial effort and
> a
> > > nicer
> > > > > > > first
> > > > > > > > > > > version.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > I'm also leaning towards the native integration, as
> long
> > > as it
> > > > > > > > reduces
> > > > > > > > > > the
> > > > > > > > > > > MVP effort. Ultimately the operator will need to also
> > > support
> > > > > the
> > > > > > > > > > > standalone mode. I would like to gain more confidence
> > that
> > > > > native
> > > > > > > > > > > integration reduces the effort. While it cuts the
> effort
> > to
> > > > > > handle
> > > > > > > > the
> > > > > > > > > TM
> > > > > > > > > > > pod creation, some mapping code from the CR to the
> native
> > > > > > > integration
> > > > > > > > > > > client and config needs to be created. As mentioned in
> > the
> > > > > FLIP,
> > > > > > > > native
> > > > > > > > > > > integration requires the Flink job manager to have
> access
> > > to
> > > > > the
> > > > > > > k8s
> > > > > > > > > API
> > > > > > > > > > to
> > > > > > > > > > > create pods, which in some scenarios may be seen as
> > > > > unfavorable.
> > > > > > > > > > >
> > > > > > > > > > >  > > > # Pod Template
> > > > > > > > > > > > > > Is the pod template in CR same with what Flink
> has
> > > > > already
> > > > > > > > > > > > supported[4]?
> > > > > > > > > > > > > > Then I am afraid not the arbitrary field(e.g.
> > > cpu/memory
> > > > > > > > > resources)
> > > > > > > > > > > > could
> > > > > > > > > > > > > > take effect.
> > > > > > > > > > >
> > > > > > > > > > > Yes, pod template would look almost identical. There
> are
> > a
> > > few
> > > > > > > > settings
> > > > > > > > > > > that the operator will control (and that may need to be
> > > > > > > blacklisted),
> > > > > > > > > but
> > > > > > > > > > > in general we would not want to place restrictions. I
> > > think a
> > > > > > > > mechanism
> > > > > > > > > > > where a pod template is merged from multiple layers
> would
> > > also
> > > > > be
> > > > > > > > > > > interesting to make this more flexible.
> > > > > > > > > > >
> > > > > > > > > > > Cheers,
> > > > > > > > > > > Thomas
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
>

Re: [DISCUSS] FLIP-212: Introduce Flink Kubernetes Operator

Reply via email to