Danny Cranmer mentioned they are interested in standalone mode, and I am too, 
so I just wanted to say that if that development starts in parallel, I might be 
able to contribute a little.

Regarding the CRD, I agree it would be nice to avoid as many "duplications" as 
possible if pod templates are to be used. In my PoC I even tried to make use of 
existing configuration options like kubernetes.container.image & pipeline.jars 
[1]. For CPU/Memory resources, the discussion in [2] might be relevant.

[1] 
https://github.com/MicroFocus/opsb-flink-k8s-operator/blob/main/kubernetes/sample_batch_job.yaml
[2] https://issues.apache.org/jira/browse/FLINK-24150

Regards,
Alexis.

-----Original Message-----
From: K Fred <yuanpengf...@gmail.com> 
Sent: Montag, 7. Februar 2022 09:36
To: dev@flink.apache.org
Subject: Re: [DISCUSS] FLIP-212: Introduce Flink Kubernetes Operator

Hi Gyula!

You are right. I think some common flink config options can be put in the CR, 
other expert settings continue to be overwritten by flink, and then the user 
can choose to customize the configuration.

Best Wishes,
Peng Yuan

On Mon, Feb 7, 2022 at 4:16 PM Gyula Fóra <gyula.f...@gmail.com> wrote:

> Hi Yangze!
>
> This is not set in stone at the moment but the way I think it should 
> work is that first class config options in the CR should always take 
> precedence over the Flink config.
>
> In general we should not introduce too many arbitrary config options 
> that duplicate the flink configs without good reasons but the ones we 
> introduce should overwrite flink configs.
>
> We should discuss and decide together what config options to keep in 
> the flink conf and what to bring on the CR level. Resource related 
> ones are difficult because on one hand they are integral to every 
> application, on the other hand there are many expert settings that we 
> should probably leave in the conf.
>
> Cheers,
> Gyula
>
>
>
> On Mon, Feb 7, 2022 at 8:28 AM Yangze Guo <karma...@gmail.com> wrote:
>
> > Thanks everyone for the great effort. The FLIP looks really good.
> >
> > I just want to make sure the configuration priority in the CR example.
> > It seems the requests resources or "taskManager. taskSlots" will be 
> > transferred to Flink internal config, e.g.
> > "taskmanager.memory.process.size" and 
> > "taskmanager.numberOfTaskSlots", and override the one in 
> > "flinkConfiguration". Am I understanding this correctly?
> >
> > Best,
> > Yangze Guo
> >
> > On Mon, Feb 7, 2022 at 10:22 AM Xintong Song <tonysong...@gmail.com>
> > wrote:
> > >
> > > Sorry for the late reply. We were out due to the public holidays 
> > > in
> > China.
> > >
> > > @Thomas,
> > >
> > > The intention is to support application management through 
> > > operator and
> > CR,
> > > > which means there won't be any 2 step submission process, which 
> > > > as
> you
> > > > allude to would defeat the purpose of this project. The CR 
> > > > example
> > shows
> > > > the application part. Please note that the bare cluster support 
> > > > is an
> > > > *additional* feature for scenarios that require external job
> > management. Is
> > > > there anything on the FLIP page that creates a different impression?
> > > >
> > >
> > > Sounds good to me. I don't remember what created the impression of 
> > > 2
> step
> > > submission back then. I revisited the latest version of this FLIP 
> > > and
> it
> > > looks good to me.
> > >
> > > @Gyula,
> > >
> > > Versioning:
> > > > Versioning will be independent from Flink and the operator will
> depend
> > on a
> > > > fixed flink version (in every given operator version).
> > > > This should be the exact same setup as with Stateful Functions ( 
> > > > https://github.com/apache/flink-statefun). So independent 
> > > > release
> > cycle
> > > > but
> > > > still within the Flink umbrella.
> > > >
> > >
> > > Does this mean if someone wants to upgrade Flink to a version that 
> > > is released after the operator version that is being used, he/she 
> > > would
> need
> > > to upgrade the operator version first?
> > > I'm not questioning this, just trying to make sure I'm 
> > > understanding
> this
> > > correctly.
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > >
> > > On Mon, Feb 7, 2022 at 3:14 AM Gyula Fóra <gyula.f...@gmail.com>
> wrote:
> > >
> > > > Thank you Alexis,
> > > >
> > > > Will definitely check this out. You are right, Kotlin makes it
> > difficult to
> > > > adopt pieces of this code directly but I think it will be good 
> > > > to get inspiration for the architecture and look at how 
> > > > particular problems
> > have
> > > > been solved. It will be a great help for us I am sure.
> > > >
> > > > Cheers,
> > > > Gyula
> > > >
> > > > On Sat, Feb 5, 2022 at 12:28 PM Alexis Sarda-Espinosa < 
> > > > alexis.sarda-espin...@microfocus.com> wrote:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > just wanted to mention that my employer agreed to open source 
> > > > > the
> > PoC I
> > > > > developed: 
> > > > > https://github.com/MicroFocus/opsb-flink-k8s-operator
> > > > >
> > > > > I understand the concern for maintainability, so Gradle & 
> > > > > Kotlin
> > might
> > > > not
> > > > > be appealing to you, but at least it gives you another reference.
> The
> > > > Helm
> > > > > resources in particular might be useful.
> > > > >
> > > > > There are bits and pieces there referring to Flink sessions, 
> > > > > but
> > those
> > > > are
> > > > > just placeholders, the functioning parts use application mode 
> > > > > with
> > native
> > > > > integration.
> > > > >
> > > > > Regards,
> > > > > Alexis.
> > > > >
> > > > > ________________________________
> > > > > From: Thomas Weise <t...@apache.org>
> > > > > Sent: Saturday, February 5, 2022 2:41 AM
> > > > > To: dev <dev@flink.apache.org>
> > > > > Subject: Re: [DISCUSS] FLIP-212: Introduce Flink Kubernetes
> Operator
> > > > >
> > > > > Hi,
> > > > >
> > > > > Thanks for the continued feedback and discussion. Looks like 
> > > > > we are ready to start a VOTE, I will initiate it shortly.
> > > > >
> > > > > In parallel it would be good to find the repository name.
> > > > >
> > > > > My suggestion would be: flink-kubernetes-operator
> > > > >
> > > > > I thought "flink-operator" could be a bit misleading since the 
> > > > > term operator already has a meaning in Flink.
> > > > >
> > > > > I also considered "flink-k8s-operator" but that would be 
> > > > > almost identical to existing operator implementations and 
> > > > > could lead to confusion in the future.
> > > > >
> > > > > Thoughts?
> > > > >
> > > > > Thanks,
> > > > > Thomas
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Feb 4, 2022 at 5:15 AM Gyula Fóra 
> > > > > <gyula.f...@gmail.com>
> > wrote:
> > > > > >
> > > > > > Hi Danny,
> > > > > >
> > > > > > So far we have been focusing our dev efforts on the initial
> native
> > > > > > implementation with the team.
> > > > > > If the discussion and vote goes well for this FLIP we are 
> > > > > > looking
> > > > forward
> > > > > > to contributing the initial version sometime next week 
> > > > > > (fingers
> > > > crossed).
> > > > > >
> > > > > > At that point I think we can already start the dev work to
> support
> > the
> > > > > > standalone mode as well, especially if you can dedicate some
> > effort to
> > > > > > pushing that side.
> > > > > > Working together on this sounds like a great idea and we 
> > > > > > should
> > start
> > > > as
> > > > > > soon as possible! :)
> > > > > >
> > > > > > Cheers,
> > > > > > Gyula
> > > > > >
> > > > > > On Fri, Feb 4, 2022 at 2:07 PM Danny Cranmer <
> > dannycran...@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > > > I have been discussing this one with my team. We are 
> > > > > > > interested
> > in
> > > > the
> > > > > > > Standalone mode, and are willing to contribute towards the
> > > > > implementation.
> > > > > > > Potentially we can work together to support both modes in
> > parallel?
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > On Wed, Feb 2, 2022 at 4:02 PM Gyula Fóra <
> gyula.f...@gmail.com>
> > > > > wrote:
> > > > > > >
> > > > > > > > Hi Danny!
> > > > > > > >
> > > > > > > > Thanks for the feedback :)
> > > > > > > >
> > > > > > > > Versioning:
> > > > > > > > Versioning will be independent from Flink and the 
> > > > > > > > operator
> will
> > > > > depend
> > > > > > > on a
> > > > > > > > fixed flink version (in every given operator version).
> > > > > > > > This should be the exact same setup as with Stateful
> Functions
> > (
> > > > > > > > https://github.com/apache/flink-statefun). So 
> > > > > > > > independent
> > release
> > > > > cycle
> > > > > > > > but
> > > > > > > > still within the Flink umbrella.
> > > > > > > >
> > > > > > > > Deployment error handling:
> > > > > > > > I think that's a very good point, as general exception
> > handling for
> > > > > the
> > > > > > > > different failure scenarios is a tricky problem. I think 
> > > > > > > > the
> > > > > exception
> > > > > > > > classifiers and retry strategies could avoid a lot of 
> > > > > > > > manual
> > > > > intervention
> > > > > > > > from the user. We will definitely need to add something 
> > > > > > > > like
> > this.
> > > > > Once
> > > > > > > we
> > > > > > > > have the repo created with the initial operator code we
> should
> > open
> > > > > some
> > > > > > > > tickets for this and put it on the short term roadmap!
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > > Gyula
> > > > > > > >
> > > > > > > > On Wed, Feb 2, 2022 at 4:50 PM Danny Cranmer <
> > > > > dannycran...@apache.org>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hey team,
> > > > > > > > >
> > > > > > > > > Great work on the FLIP, I am looking forward to this 
> > > > > > > > > one. I
> > agree
> > > > > that
> > > > > > > we
> > > > > > > > > can move forward to the voting stage.
> > > > > > > > >
> > > > > > > > > I have general feedback around how we will handle job
> > submission
> > > > > > > failure
> > > > > > > > > and retry. As discussed in the Rejected Alternatives
> > section, we
> > > > > can
> > > > > > > use
> > > > > > > > > Java to handle job submission failures from the Flink
> > client. It
> > > > > would
> > > > > > > be
> > > > > > > > > useful to have the ability to configure exception
> > classifiers and
> > > > > retry
> > > > > > > > > strategy as part of operator configuration.
> > > > > > > > >
> > > > > > > > > Given this will be in a separate Github repository I 
> > > > > > > > > am
> > curious
> > > > how
> > > > > > > ther
> > > > > > > > > versioning strategy will work in relation to the Flink
> > version?
> > > > Do
> > > > > we
> > > > > > > > have
> > > > > > > > > any other components with a similar setup I can look at?
> > Will the
> > > > > > > > operator
> > > > > > > > > version track Flink or will it use its own versioning
> > strategy
> > > > > with a
> > > > > > > > Flink
> > > > > > > > > version support matrix, or similar?
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Tue, Feb 1, 2022 at 2:33 PM Márton Balassi <
> > > > > > > balassi.mar...@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi team,
> > > > > > > > > >
> > > > > > > > > > Thank you for the great feedback, Thomas has updated 
> > > > > > > > > > the
> > FLIP
> > > > > page
> > > > > > > > > > accordingly. If you are comfortable with the 
> > > > > > > > > > currently
> > existing
> > > > > > > design
> > > > > > > > > and
> > > > > > > > > > depth in the FLIP [1] I suggest moving forward to 
> > > > > > > > > > the
> > voting
> > > > > stage -
> > > > > > > > once
> > > > > > > > > > that reaches a positive conclusion it lets us create 
> > > > > > > > > > the
> > > > separate
> > > > > > > code
> > > > > > > > > > repository under the flink project for the operator.
> > > > > > > > > >
> > > > > > > > > > I encourage everyone to keep improving the details 
> > > > > > > > > > in the
> > > > > meantime,
> > > > > > > > > however
> > > > > > > > > > I believe given the existing design and the general
> > sentiment
> > > > on
> > > > > this
> > > > > > > > > > thread that the most efficient path from here is 
> > > > > > > > > > starting
> > the
> > > > > > > > > > implementation so that we can collectively iterate 
> > > > > > > > > > over
> it.
> > > > > > > > > >
> > > > > > > > > > [1]
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-212%3A+Introduc
> e+Flink+Kubernetes+Operator
> > > > > > > > > >
> > > > > > > > > > On Mon, Jan 31, 2022 at 10:15 PM Thomas Weise <
> > t...@apache.org>
> > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > HI Xintong,
> > > > > > > > > > >
> > > > > > > > > > > Thanks for the feedback and please see responses 
> > > > > > > > > > > below
> > -->
> > > > > > > > > > >
> > > > > > > > > > > On Fri, Jan 28, 2022 at 12:21 AM Xintong Song <
> > > > > > > tonysong...@gmail.com
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Thanks Thomas for drafting this FLIP, and 
> > > > > > > > > > > > everyone
> for
> > the
> > > > > > > > > discussion.
> > > > > > > > > > > >
> > > > > > > > > > > > I also have a few questions and comments.
> > > > > > > > > > > >
> > > > > > > > > > > > ## Job Submission Deploying a Flink session 
> > > > > > > > > > > > cluster via kubectl & CR
> and
> > then
> > > > > > > > > submitting
> > > > > > > > > > > jobs
> > > > > > > > > > > > to the cluster via Flink cli / REST is probably 
> > > > > > > > > > > > the
> > > > approach
> > > > > that
> > > > > > > > > > > requires
> > > > > > > > > > > > the least effort. However, I'd like to point out 
> > > > > > > > > > > > 2
> > > > > weaknesses.
> > > > > > > > > > > > 1. A lot of users use Flink in 
> > > > > > > > > > > > perjob/application
> > modes.
> > > > For
> > > > > > > these
> > > > > > > > > > users,
> > > > > > > > > > > > having to run the job in two steps (deploy the
> > cluster, and
> > > > > > > submit
> > > > > > > > > the
> > > > > > > > > > > job)
> > > > > > > > > > > > is not that convenient.
> > > > > > > > > > > > 2. One of our motivations is being able to 
> > > > > > > > > > > > manage
> Flink
> > > > > > > > applications'
> > > > > > > > > > > > lifecycles with kubectl. Submitting jobs from 
> > > > > > > > > > > > cli
> > sounds
> > > > not
> > > > > > > > aligned
> > > > > > > > > > with
> > > > > > > > > > > > this motivation.
> > > > > > > > > > > > I think it's probably worth it to support 
> > > > > > > > > > > > submitting
> > jobs
> > > > via
> > > > > > > > > kubectl &
> > > > > > > > > > > CR
> > > > > > > > > > > > in the first version, both together with 
> > > > > > > > > > > > deploying
> the
> > > > > cluster
> > > > > > > like
> > > > > > > > > in
> > > > > > > > > > > > perjob/application mode and after deploying the
> cluster
> > > > like
> > > > > in
> > > > > > > > > session
> > > > > > > > > > > > mode.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > The intention is to support application management
> > through
> > > > > operator
> > > > > > > > and
> > > > > > > > > > CR,
> > > > > > > > > > > which means there won't be any 2 step submission
> process,
> > > > > which as
> > > > > > > > you
> > > > > > > > > > > allude to would defeat the purpose of this 
> > > > > > > > > > > project. The
> > CR
> > > > > example
> > > > > > > > > shows
> > > > > > > > > > > the application part. Please note that the bare 
> > > > > > > > > > > cluster
> > > > > support is
> > > > > > > an
> > > > > > > > > > > *additional* feature for scenarios that require
> external
> > job
> > > > > > > > > management.
> > > > > > > > > > Is
> > > > > > > > > > > there anything on the FLIP page that creates a
> different
> > > > > > > impression?
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > ## Versioning
> > > > > > > > > > > > Which Flink versions does the operator plan to
> support?
> > > > > > > > > > > > 1. Native K8s deployment was firstly introduced 
> > > > > > > > > > > > in
> > Flink
> > > > 1.10
> > > > > > > > > > > > 2. Native K8s HA was introduced in Flink 1.12 3. 
> > > > > > > > > > > > The Pod template support was introduced in Flink
> > 1.13
> > > > > > > > > > > > 4. There was some changes to the Flink docker 
> > > > > > > > > > > > image
> > > > > entrypoint
> > > > > > > > script
> > > > > > > > > > in,
> > > > > > > > > > > > IIRC, Flink 1.13
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Great, thanks for providing this. It is important 
> > > > > > > > > > > for
> the
> > > > > > > > compatibility
> > > > > > > > > > > going forward also. We are targeting Flink 1.14.x
> > upwards.
> > > > > Before
> > > > > > > the
> > > > > > > > > > > operator is ready there will be another Flink release.
> > Let's
> > > > > see if
> > > > > > > > > > anyone
> > > > > > > > > > > is interested in earlier versions?
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > ## Compatibility What kind of API compatibility 
> > > > > > > > > > > > we can commit to? It's
> > > > > probably
> > > > > > > fine
> > > > > > > > > to
> > > > > > > > > > > have
> > > > > > > > > > > > alpha / beta version APIs that allow 
> > > > > > > > > > > > incompatible
> > future
> > > > > changes
> > > > > > > > for
> > > > > > > > > > the
> > > > > > > > > > > > first version. But eventually we would need to
> > guarantee
> > > > > > > backwards
> > > > > > > > > > > > compatibility, so that an early version CR can 
> > > > > > > > > > > > work
> > with a
> > > > > new
> > > > > > > > > version
> > > > > > > > > > > > operator.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Another great point and please let me include that 
> > > > > > > > > > > on
> the
> > > > FLIP
> > > > > > > page.
> > > > > > > > > ;-)
> > > > > > > > > > >
> > > > > > > > > > > I think we should allow incompatible changes for 
> > > > > > > > > > > the
> > first
> > > > one
> > > > > or
> > > > > > > two
> > > > > > > > > > > versions, similar to how other major features have
> > evolved
> > > > > > > recently,
> > > > > > > > > such
> > > > > > > > > > > as FLIP-27.
> > > > > > > > > > >
> > > > > > > > > > > Would be great to get broader feedback on this one.
> > > > > > > > > > >
> > > > > > > > > > > Cheers,
> > > > > > > > > > > Thomas
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Thank you~
> > > > > > > > > > > >
> > > > > > > > > > > > Xintong Song
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, Jan 28, 2022 at 1:18 PM Thomas Weise <
> > > > t...@apache.org
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Thanks for the feedback!
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > # 1 Flink Native vs Standalone integration 
> > > > > > > > > > > > > > Maybe we should make this more clear in the 
> > > > > > > > > > > > > > FLIP
> > but we
> > > > > > > agreed
> > > > > > > > to
> > > > > > > > > > do
> > > > > > > > > > > > the
> > > > > > > > > > > > > > first version of the operator based on the 
> > > > > > > > > > > > > > native
> > > > > > > integration.
> > > > > > > > > > > > > > While this clearly does not cover all 
> > > > > > > > > > > > > > use-cases
> and
> > > > > > > > requirements,
> > > > > > > > > > it
> > > > > > > > > > > > > seems
> > > > > > > > > > > > > > this would lead to a much smaller initial 
> > > > > > > > > > > > > > effort
> > and a
> > > > > nicer
> > > > > > > > > first
> > > > > > > > > > > > > version.
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > I'm also leaning towards the native 
> > > > > > > > > > > > > integration, as
> > long
> > > > > as it
> > > > > > > > > > reduces
> > > > > > > > > > > > the
> > > > > > > > > > > > > MVP effort. Ultimately the operator will need 
> > > > > > > > > > > > > to
> also
> > > > > support
> > > > > > > the
> > > > > > > > > > > > > standalone mode. I would like to gain more
> confidence
> > > > that
> > > > > > > native
> > > > > > > > > > > > > integration reduces the effort. While it cuts 
> > > > > > > > > > > > > the
> > effort
> > > > to
> > > > > > > > handle
> > > > > > > > > > the
> > > > > > > > > > > TM
> > > > > > > > > > > > > pod creation, some mapping code from the CR to 
> > > > > > > > > > > > > the
> > native
> > > > > > > > > integration
> > > > > > > > > > > > > client and config needs to be created. As 
> > > > > > > > > > > > > mentioned
> > in
> > > > the
> > > > > > > FLIP,
> > > > > > > > > > native
> > > > > > > > > > > > > integration requires the Flink job manager to 
> > > > > > > > > > > > > have
> > access
> > > > > to
> > > > > > > the
> > > > > > > > > k8s
> > > > > > > > > > > API
> > > > > > > > > > > > to
> > > > > > > > > > > > > create pods, which in some scenarios may be 
> > > > > > > > > > > > > seen as
> > > > > > > unfavorable.
> > > > > > > > > > > > >
> > > > > > > > > > > > >  > > > # Pod Template
> > > > > > > > > > > > > > > > Is the pod template in CR same with what
> Flink
> > has
> > > > > > > already
> > > > > > > > > > > > > > supported[4]?
> > > > > > > > > > > > > > > > Then I am afraid not the arbitrary field(e.g.
> > > > > cpu/memory
> > > > > > > > > > > resources)
> > > > > > > > > > > > > > could
> > > > > > > > > > > > > > > > take effect.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Yes, pod template would look almost identical.
> There
> > are
> > > > a
> > > > > few
> > > > > > > > > > settings
> > > > > > > > > > > > > that the operator will control (and that may 
> > > > > > > > > > > > > need
> to
> > be
> > > > > > > > > blacklisted),
> > > > > > > > > > > but
> > > > > > > > > > > > > in general we would not want to place
> restrictions. I
> > > > > think a
> > > > > > > > > > mechanism
> > > > > > > > > > > > > where a pod template is merged from multiple 
> > > > > > > > > > > > > layers
> > would
> > > > > also
> > > > > > > be
> > > > > > > > > > > > > interesting to make this more flexible.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > Thomas
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> >
>

Reply via email to