Re: [DISCUSS] FLIP-212: Introduce Flink Kubernetes Operator

K Fred Mon, 14 Feb 2022 00:21:16 -0800

Hi Gyula,

Thanks!
It's great to see the project getting started and I can't wait to see the
PR and start contributing code.😄😄😄


Best Wishes!
Peng Yuan

On Mon, Feb 14, 2022 at 4:14 PM Gyula Fóra <[email protected]> wrote:

> Hi Peng Yuan!
>
> The repo is already created:
> https://github.com/apache/flink-kubernetes-operator
>
> We will open the PR with the initial prototype later today, stay tuned in
> this thread! :)
>
> Cheers,
> Gyula
>
> On Mon, Feb 14, 2022 at 9:09 AM K Fred <[email protected]> wrote:
>
> > Hi All,
> >
> > Has the project of flink-kubernetes-operator been created in github?
> >
> > Peng Yuan
> >
> > On Wed, Feb 9, 2022 at 1:23 AM Gyula Fóra <[email protected]> wrote:
> >
> > > I agree with flink-kubernetes-operator as the repo name :)
> > > Don't have any better idea
> > >
> > > Gyula
> > >
> > > On Sat, Feb 5, 2022 at 2:41 AM Thomas Weise <[email protected]> wrote:
> > >
> > > > Hi,
> > > >
> > > > Thanks for the continued feedback and discussion. Looks like we are
> > > > ready to start a VOTE, I will initiate it shortly.
> > > >
> > > > In parallel it would be good to find the repository name.
> > > >
> > > > My suggestion would be: flink-kubernetes-operator
> > > >
> > > > I thought "flink-operator" could be a bit misleading since the term
> > > > operator already has a meaning in Flink.
> > > >
> > > > I also considered "flink-k8s-operator" but that would be almost
> > > > identical to existing operator implementations and could lead to
> > > > confusion in the future.
> > > >
> > > > Thoughts?
> > > >
> > > > Thanks,
> > > > Thomas
> > > >
> > > >
> > > >
> > > > On Fri, Feb 4, 2022 at 5:15 AM Gyula Fóra <[email protected]>
> > wrote:
> > > > >
> > > > > Hi Danny,
> > > > >
> > > > > So far we have been focusing our dev efforts on the initial native
> > > > > implementation with the team.
> > > > > If the discussion and vote goes well for this FLIP we are looking
> > > forward
> > > > > to contributing the initial version sometime next week (fingers
> > > crossed).
> > > > >
> > > > > At that point I think we can already start the dev work to support
> > the
> > > > > standalone mode as well, especially if you can dedicate some effort
> > to
> > > > > pushing that side.
> > > > > Working together on this sounds like a great idea and we should
> start
> > > as
> > > > > soon as possible! :)
> > > > >
> > > > > Cheers,
> > > > > Gyula
> > > > >
> > > > > On Fri, Feb 4, 2022 at 2:07 PM Danny Cranmer <
> > [email protected]>
> > > > > wrote:
> > > > >
> > > > > > I have been discussing this one with my team. We are interested
> in
> > > the
> > > > > > Standalone mode, and are willing to contribute towards the
> > > > implementation.
> > > > > > Potentially we can work together to support both modes in
> parallel?
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > On Wed, Feb 2, 2022 at 4:02 PM Gyula Fóra <[email protected]>
> > > > wrote:
> > > > > >
> > > > > > > Hi Danny!
> > > > > > >
> > > > > > > Thanks for the feedback :)
> > > > > > >
> > > > > > > Versioning:
> > > > > > > Versioning will be independent from Flink and the operator will
> > > > depend
> > > > > > on a
> > > > > > > fixed flink version (in every given operator version).
> > > > > > > This should be the exact same setup as with Stateful Functions
> (
> > > > > > > https://github.com/apache/flink-statefun). So independent
> > release
> > > > cycle
> > > > > > > but
> > > > > > > still within the Flink umbrella.
> > > > > > >
> > > > > > > Deployment error handling:
> > > > > > > I think that's a very good point, as general exception handling
> > for
> > > > the
> > > > > > > different failure scenarios is a tricky problem. I think the
> > > > exception
> > > > > > > classifiers and retry strategies could avoid a lot of manual
> > > > intervention
> > > > > > > from the user. We will definitely need to add something like
> > this.
> > > > Once
> > > > > > we
> > > > > > > have the repo created with the initial operator code we should
> > open
> > > > some
> > > > > > > tickets for this and put it on the short term roadmap!
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Gyula
> > > > > > >
> > > > > > > On Wed, Feb 2, 2022 at 4:50 PM Danny Cranmer <
> > > > [email protected]>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hey team,
> > > > > > > >
> > > > > > > > Great work on the FLIP, I am looking forward to this one. I
> > agree
> > > > that
> > > > > > we
> > > > > > > > can move forward to the voting stage.
> > > > > > > >
> > > > > > > > I have general feedback around how we will handle job
> > submission
> > > > > > failure
> > > > > > > > and retry. As discussed in the Rejected Alternatives section,
> > we
> > > > can
> > > > > > use
> > > > > > > > Java to handle job submission failures from the Flink client.
> > It
> > > > would
> > > > > > be
> > > > > > > > useful to have the ability to configure exception classifiers
> > and
> > > > retry
> > > > > > > > strategy as part of operator configuration.
> > > > > > > >
> > > > > > > > Given this will be in a separate Github repository I am
> curious
> > > how
> > > > > > ther
> > > > > > > > versioning strategy will work in relation to the Flink
> version?
> > > Do
> > > > we
> > > > > > > have
> > > > > > > > any other components with a similar setup I can look at? Will
> > the
> > > > > > > operator
> > > > > > > > version track Flink or will it use its own versioning
> strategy
> > > > with a
> > > > > > > Flink
> > > > > > > > version support matrix, or similar?
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, Feb 1, 2022 at 2:33 PM Márton Balassi <
> > > > > > [email protected]>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi team,
> > > > > > > > >
> > > > > > > > > Thank you for the great feedback, Thomas has updated the
> FLIP
> > > > page
> > > > > > > > > accordingly. If you are comfortable with the currently
> > existing
> > > > > > design
> > > > > > > > and
> > > > > > > > > depth in the FLIP [1] I suggest moving forward to the
> voting
> > > > stage -
> > > > > > > once
> > > > > > > > > that reaches a positive conclusion it lets us create the
> > > separate
> > > > > > code
> > > > > > > > > repository under the flink project for the operator.
> > > > > > > > >
> > > > > > > > > I encourage everyone to keep improving the details in the
> > > > meantime,
> > > > > > > > however
> > > > > > > > > I believe given the existing design and the general
> sentiment
> > > on
> > > > this
> > > > > > > > > thread that the most efficient path from here is starting
> the
> > > > > > > > > implementation so that we can collectively iterate over it.
> > > > > > > > >
> > > > > > > > > [1]
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-212%3A+Introduce+Flink+Kubernetes+Operator
> > > > > > > > >
> > > > > > > > > On Mon, Jan 31, 2022 at 10:15 PM Thomas Weise <
> > [email protected]>
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > HI Xintong,
> > > > > > > > > >
> > > > > > > > > > Thanks for the feedback and please see responses below
> -->
> > > > > > > > > >
> > > > > > > > > > On Fri, Jan 28, 2022 at 12:21 AM Xintong Song <
> > > > > > [email protected]
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Thanks Thomas for drafting this FLIP, and everyone for
> > the
> > > > > > > > discussion.
> > > > > > > > > > >
> > > > > > > > > > > I also have a few questions and comments.
> > > > > > > > > > >
> > > > > > > > > > > ## Job Submission
> > > > > > > > > > > Deploying a Flink session cluster via kubectl & CR and
> > then
> > > > > > > > submitting
> > > > > > > > > > jobs
> > > > > > > > > > > to the cluster via Flink cli / REST is probably the
> > > approach
> > > > that
> > > > > > > > > > requires
> > > > > > > > > > > the least effort. However, I'd like to point out 2
> > > > weaknesses.
> > > > > > > > > > > 1. A lot of users use Flink in perjob/application
> modes.
> > > For
> > > > > > these
> > > > > > > > > users,
> > > > > > > > > > > having to run the job in two steps (deploy the cluster,
> > and
> > > > > > submit
> > > > > > > > the
> > > > > > > > > > job)
> > > > > > > > > > > is not that convenient.
> > > > > > > > > > > 2. One of our motivations is being able to manage Flink
> > > > > > > applications'
> > > > > > > > > > > lifecycles with kubectl. Submitting jobs from cli
> sounds
> > > not
> > > > > > > aligned
> > > > > > > > > with
> > > > > > > > > > > this motivation.
> > > > > > > > > > > I think it's probably worth it to support submitting
> jobs
> > > via
> > > > > > > > kubectl &
> > > > > > > > > > CR
> > > > > > > > > > > in the first version, both together with deploying the
> > > > cluster
> > > > > > like
> > > > > > > > in
> > > > > > > > > > > perjob/application mode and after deploying the cluster
> > > like
> > > > in
> > > > > > > > session
> > > > > > > > > > > mode.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > The intention is to support application management
> through
> > > > operator
> > > > > > > and
> > > > > > > > > CR,
> > > > > > > > > > which means there won't be any 2 step submission process,
> > > > which as
> > > > > > > you
> > > > > > > > > > allude to would defeat the purpose of this project. The
> CR
> > > > example
> > > > > > > > shows
> > > > > > > > > > the application part. Please note that the bare cluster
> > > > support is
> > > > > > an
> > > > > > > > > > *additional* feature for scenarios that require external
> > job
> > > > > > > > management.
> > > > > > > > > Is
> > > > > > > > > > there anything on the FLIP page that creates a different
> > > > > > impression?
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > ## Versioning
> > > > > > > > > > > Which Flink versions does the operator plan to support?
> > > > > > > > > > > 1. Native K8s deployment was firstly introduced in
> Flink
> > > 1.10
> > > > > > > > > > > 2. Native K8s HA was introduced in Flink 1.12
> > > > > > > > > > > 3. The Pod template support was introduced in Flink
> 1.13
> > > > > > > > > > > 4. There was some changes to the Flink docker image
> > > > entrypoint
> > > > > > > script
> > > > > > > > > in,
> > > > > > > > > > > IIRC, Flink 1.13
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Great, thanks for providing this. It is important for the
> > > > > > > compatibility
> > > > > > > > > > going forward also. We are targeting Flink 1.14.x
> upwards.
> > > > Before
> > > > > > the
> > > > > > > > > > operator is ready there will be another Flink release.
> > Let's
> > > > see if
> > > > > > > > > anyone
> > > > > > > > > > is interested in earlier versions?
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > ## Compatibility
> > > > > > > > > > > What kind of API compatibility we can commit to? It's
> > > > probably
> > > > > > fine
> > > > > > > > to
> > > > > > > > > > have
> > > > > > > > > > > alpha / beta version APIs that allow incompatible
> future
> > > > changes
> > > > > > > for
> > > > > > > > > the
> > > > > > > > > > > first version. But eventually we would need to
> guarantee
> > > > > > backwards
> > > > > > > > > > > compatibility, so that an early version CR can work
> with
> > a
> > > > new
> > > > > > > > version
> > > > > > > > > > > operator.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Another great point and please let me include that on the
> > > FLIP
> > > > > > page.
> > > > > > > > ;-)
> > > > > > > > > >
> > > > > > > > > > I think we should allow incompatible changes for the
> first
> > > one
> > > > or
> > > > > > two
> > > > > > > > > > versions, similar to how other major features have
> evolved
> > > > > > recently,
> > > > > > > > such
> > > > > > > > > > as FLIP-27.
> > > > > > > > > >
> > > > > > > > > > Would be great to get broader feedback on this one.
> > > > > > > > > >
> > > > > > > > > > Cheers,
> > > > > > > > > > Thomas
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Thank you~
> > > > > > > > > > >
> > > > > > > > > > > Xintong Song
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Fri, Jan 28, 2022 at 1:18 PM Thomas Weise <
> > > [email protected]
> > > > >
> > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Thanks for the feedback!
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > # 1 Flink Native vs Standalone integration
> > > > > > > > > > > > > Maybe we should make this more clear in the FLIP
> but
> > we
> > > > > > agreed
> > > > > > > to
> > > > > > > > > do
> > > > > > > > > > > the
> > > > > > > > > > > > > first version of the operator based on the native
> > > > > > integration.
> > > > > > > > > > > > > While this clearly does not cover all use-cases and
> > > > > > > requirements,
> > > > > > > > > it
> > > > > > > > > > > > seems
> > > > > > > > > > > > > this would lead to a much smaller initial effort
> and
> > a
> > > > nicer
> > > > > > > > first
> > > > > > > > > > > > version.
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > I'm also leaning towards the native integration, as
> > long
> > > > as it
> > > > > > > > > reduces
> > > > > > > > > > > the
> > > > > > > > > > > > MVP effort. Ultimately the operator will need to also
> > > > support
> > > > > > the
> > > > > > > > > > > > standalone mode. I would like to gain more confidence
> > > that
> > > > > > native
> > > > > > > > > > > > integration reduces the effort. While it cuts the
> > effort
> > > to
> > > > > > > handle
> > > > > > > > > the
> > > > > > > > > > TM
> > > > > > > > > > > > pod creation, some mapping code from the CR to the
> > native
> > > > > > > > integration
> > > > > > > > > > > > client and config needs to be created. As mentioned
> in
> > > the
> > > > > > FLIP,
> > > > > > > > > native
> > > > > > > > > > > > integration requires the Flink job manager to have
> > access
> > > > to
> > > > > > the
> > > > > > > > k8s
> > > > > > > > > > API
> > > > > > > > > > > to
> > > > > > > > > > > > create pods, which in some scenarios may be seen as
> > > > > > unfavorable.
> > > > > > > > > > > >
> > > > > > > > > > > >  > > > # Pod Template
> > > > > > > > > > > > > > > Is the pod template in CR same with what Flink
> > has
> > > > > > already
> > > > > > > > > > > > > supported[4]?
> > > > > > > > > > > > > > > Then I am afraid not the arbitrary field(e.g.
> > > > cpu/memory
> > > > > > > > > > resources)
> > > > > > > > > > > > > could
> > > > > > > > > > > > > > > take effect.
> > > > > > > > > > > >
> > > > > > > > > > > > Yes, pod template would look almost identical. There
> > are
> > > a
> > > > few
> > > > > > > > > settings
> > > > > > > > > > > > that the operator will control (and that may need to
> be
> > > > > > > > blacklisted),
> > > > > > > > > > but
> > > > > > > > > > > > in general we would not want to place restrictions. I
> > > > think a
> > > > > > > > > mechanism
> > > > > > > > > > > > where a pod template is merged from multiple layers
> > would
> > > > also
> > > > > > be
> > > > > > > > > > > > interesting to make this more flexible.
> > > > > > > > > > > >
> > > > > > > > > > > > Cheers,
> > > > > > > > > > > > Thomas
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-212: Introduce Flink Kubernetes Operator

Reply via email to