Hi Gyula!

I have reviewed the prototype design of flink-kubernetes-operator you
submitted, and I have the following questions:

1.Can a Flink Jar package that supports pulling from the sidecar be added
to the JobSpec? just like this:

> initContainers:
>       - name: downloader
>         image: curlimages/curl
>         env:
>           - name: JAR_URL
>             value:
> https://repo1.maven.org/maven2/org/apache/flink/flink-examples-streaming_2.12/1.14.3/flink-examples-streaming_2.12-1.14.3-WordCount.jar
>           - name: DEST_PATH
>             value: /cache/flink-app.jar
>         command: ['sh', '-c', 'curl -o ${DEST_PATH} ${JAR_URL}']

2.Can we add savepoint path property to job specification?
3.Can we add an extra port to the JobManagerSpec and TaskManagerSpec to
expose some service ,such as prometheus?The property can be this:

> extraPorts:
>       - name: prom
>         containerPort: 9249



Best wishes,
Peng Yuan

On Tue, Feb 15, 2022 at 12:23 AM Gyula F贸ra <gyf...@apache.org> wrote:

> Hi Flink Devs!
>
> We would like to present to you the first prototype of the
> flink-kubernetes-operator that was built based on the FLIP and the
> discussion on this mail thread. We would also like to call out some design
> decisions that we have made regarding architecture components that were not
> explicitly mentioned in the FLIP document/thread and give you the
> opportunity to raise any concerns here.
>
> You can find the initial prototype here:
> https://github.com/apache/flink-kubernetes-operator/pull/1
>
> We will leave the PR open for 1-2 days before merging to let people comment
> on it, but please be mindful that this is an initial prototype with many
> rough edges. It is not intended to be a complete implementation of the FLIP
> specs as that will take some more work from all of us :)
>
>
> *Prototype feature set:*The prototype contains a basic working version of
> the flink-kubernetes-operator that supports deployment and lifecycle
> management of a stateful native flink application. We have basic support
> for stateful and stateless upgrades, UI ingress, pod templates etc. Error
> handling at this point is largely missing.
>
>
> *Features / design decisions that were not explicitly discussed in this
> thread*
>
> *Basic Admission control using a Webhook*Standard resource admission
> control in Kubernetes to validate and potentially reject resources is done
> through Webhooks.
>
> https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/
> This is a necessary mechanism to give the user an upfront error when an
> incorrect resource was submitted. In the Flink operator's case we need to
> validate that the FlinkDeployment yaml actually makes sense and does not
> contain erroneous config options that would inevitably lead to
> deployment/job failures.
>
> We have implemented a simple webhook that we can use for this type of
> validation, as a separate maven module (flink-kubernetes-webhook). The
> webhook is an optional component and can be enabled or disabled during
> deployment. To avoid pulling in new external dependencies we have used the
> Flink Shaded Netty module to build the simple rest endpoint required. If
> the community feels that Netty adds unnecessary complexity to the webhook
> implementation we are open to alternative backends such as Springboot for
> instance which would practically eliminate all the boilerplate.
>
>
> *Helm Chart for deployment*Helm charts provide an industry standard way of
> managing kubernetes deployments. We have created a helm chart prototype
> that can be used to deploy the operator together with all required
> resources. The helm chart allows easy configuration for things like images,
> namespaces etc and flags to control specific parts of the deployment such
> as RBAC or the webhook.
>
> The helm chart provided is intended to be a first version that worked for
> us during development but we expect to have a lot of iterations on it based
> on the feedback from the community.
>
> *Acknowledgment*
> We would like to thank everyone who has provided support and valuable
> feedback on this FLIP.
> We would also like to thank Yang Wang & Alexis Sarda-Espinosa specifically
> for making their operators open source and available to us which had a big
> impact on the FLIP and the prototype.
>
> We are looking forward to continuing development on the operator together
> with the broader community.
> All work will be tracked using the ASF Jira from now on.
>
> Cheers,
> Gyula
>
> On Mon, Feb 14, 2022 at 9:21 AM K Fred <yuanpengf...@gmail.com> wrote:
>
> > Hi Gyula,
> >
> > Thanks!
> > It's great to see the project getting started and I can't wait to see the
> > PR and start contributing code.馃槃馃槃馃槃
> >
> > Best Wishes!
> > Peng Yuan
> >
> > On Mon, Feb 14, 2022 at 4:14 PM Gyula F贸ra <gyula.f...@gmail.com> wrote:
> >
> > > Hi Peng Yuan!
> > >
> > > The repo is already created:
> > > https://github.com/apache/flink-kubernetes-operator
> > >
> > > We will open the PR with the initial prototype later today, stay tuned
> in
> > > this thread! :)
> > >
> > > Cheers,
> > > Gyula
> > >
> > > On Mon, Feb 14, 2022 at 9:09 AM K Fred <yuanpengf...@gmail.com> wrote:
> > >
> > > > Hi All,
> > > >
> > > > Has the project of flink-kubernetes-operator been created in github?
> > > >
> > > > Peng Yuan
> > > >
> > > > On Wed, Feb 9, 2022 at 1:23 AM Gyula F贸ra <gyula.f...@gmail.com>
> > wrote:
> > > >
> > > > > I agree with flink-kubernetes-operator as the repo name :)
> > > > > Don't have any better idea
> > > > >
> > > > > Gyula
> > > > >
> > > > > On Sat, Feb 5, 2022 at 2:41 AM Thomas Weise <t...@apache.org>
> wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Thanks for the continued feedback and discussion. Looks like we
> are
> > > > > > ready to start a VOTE, I will initiate it shortly.
> > > > > >
> > > > > > In parallel it would be good to find the repository name.
> > > > > >
> > > > > > My suggestion would be: flink-kubernetes-operator
> > > > > >
> > > > > > I thought "flink-operator" could be a bit misleading since the
> term
> > > > > > operator already has a meaning in Flink.
> > > > > >
> > > > > > I also considered "flink-k8s-operator" but that would be almost
> > > > > > identical to existing operator implementations and could lead to
> > > > > > confusion in the future.
> > > > > >
> > > > > > Thoughts?
> > > > > >
> > > > > > Thanks,
> > > > > > Thomas
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, Feb 4, 2022 at 5:15 AM Gyula F贸ra <gyula.f...@gmail.com>
> > > > wrote:
> > > > > > >
> > > > > > > Hi Danny,
> > > > > > >
> > > > > > > So far we have been focusing our dev efforts on the initial
> > native
> > > > > > > implementation with the team.
> > > > > > > If the discussion and vote goes well for this FLIP we are
> looking
> > > > > forward
> > > > > > > to contributing the initial version sometime next week (fingers
> > > > > crossed).
> > > > > > >
> > > > > > > At that point I think we can already start the dev work to
> > support
> > > > the
> > > > > > > standalone mode as well, especially if you can dedicate some
> > effort
> > > > to
> > > > > > > pushing that side.
> > > > > > > Working together on this sounds like a great idea and we should
> > > start
> > > > > as
> > > > > > > soon as possible! :)
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Gyula
> > > > > > >
> > > > > > > On Fri, Feb 4, 2022 at 2:07 PM Danny Cranmer <
> > > > dannycran...@apache.org>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > I have been discussing this one with my team. We are
> interested
> > > in
> > > > > the
> > > > > > > > Standalone mode, and are willing to contribute towards the
> > > > > > implementation.
> > > > > > > > Potentially we can work together to support both modes in
> > > parallel?
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > On Wed, Feb 2, 2022 at 4:02 PM Gyula F贸ra <
> > gyula.f...@gmail.com>
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Danny!
> > > > > > > > >
> > > > > > > > > Thanks for the feedback :)
> > > > > > > > >
> > > > > > > > > Versioning:
> > > > > > > > > Versioning will be independent from Flink and the operator
> > will
> > > > > > depend
> > > > > > > > on a
> > > > > > > > > fixed flink version (in every given operator version).
> > > > > > > > > This should be the exact same setup as with Stateful
> > Functions
> > > (
> > > > > > > > > https://github.com/apache/flink-statefun). So independent
> > > > release
> > > > > > cycle
> > > > > > > > > but
> > > > > > > > > still within the Flink umbrella.
> > > > > > > > >
> > > > > > > > > Deployment error handling:
> > > > > > > > > I think that's a very good point, as general exception
> > handling
> > > > for
> > > > > > the
> > > > > > > > > different failure scenarios is a tricky problem. I think
> the
> > > > > > exception
> > > > > > > > > classifiers and retry strategies could avoid a lot of
> manual
> > > > > > intervention
> > > > > > > > > from the user. We will definitely need to add something
> like
> > > > this.
> > > > > > Once
> > > > > > > > we
> > > > > > > > > have the repo created with the initial operator code we
> > should
> > > > open
> > > > > > some
> > > > > > > > > tickets for this and put it on the short term roadmap!
> > > > > > > > >
> > > > > > > > > Cheers,
> > > > > > > > > Gyula
> > > > > > > > >
> > > > > > > > > On Wed, Feb 2, 2022 at 4:50 PM Danny Cranmer <
> > > > > > dannycran...@apache.org>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hey team,
> > > > > > > > > >
> > > > > > > > > > Great work on the FLIP, I am looking forward to this
> one. I
> > > > agree
> > > > > > that
> > > > > > > > we
> > > > > > > > > > can move forward to the voting stage.
> > > > > > > > > >
> > > > > > > > > > I have general feedback around how we will handle job
> > > > submission
> > > > > > > > failure
> > > > > > > > > > and retry. As discussed in the Rejected Alternatives
> > section,
> > > > we
> > > > > > can
> > > > > > > > use
> > > > > > > > > > Java to handle job submission failures from the Flink
> > client.
> > > > It
> > > > > > would
> > > > > > > > be
> > > > > > > > > > useful to have the ability to configure exception
> > classifiers
> > > > and
> > > > > > retry
> > > > > > > > > > strategy as part of operator configuration.
> > > > > > > > > >
> > > > > > > > > > Given this will be in a separate Github repository I am
> > > curious
> > > > > how
> > > > > > > > ther
> > > > > > > > > > versioning strategy will work in relation to the Flink
> > > version?
> > > > > Do
> > > > > > we
> > > > > > > > > have
> > > > > > > > > > any other components with a similar setup I can look at?
> > Will
> > > > the
> > > > > > > > > operator
> > > > > > > > > > version track Flink or will it use its own versioning
> > > strategy
> > > > > > with a
> > > > > > > > > Flink
> > > > > > > > > > version support matrix, or similar?
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Tue, Feb 1, 2022 at 2:33 PM M谩rton Balassi <
> > > > > > > > balassi.mar...@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi team,
> > > > > > > > > > >
> > > > > > > > > > > Thank you for the great feedback, Thomas has updated
> the
> > > FLIP
> > > > > > page
> > > > > > > > > > > accordingly. If you are comfortable with the currently
> > > > existing
> > > > > > > > design
> > > > > > > > > > and
> > > > > > > > > > > depth in the FLIP [1] I suggest moving forward to the
> > > voting
> > > > > > stage -
> > > > > > > > > once
> > > > > > > > > > > that reaches a positive conclusion it lets us create
> the
> > > > > separate
> > > > > > > > code
> > > > > > > > > > > repository under the flink project for the operator.
> > > > > > > > > > >
> > > > > > > > > > > I encourage everyone to keep improving the details in
> the
> > > > > > meantime,
> > > > > > > > > > however
> > > > > > > > > > > I believe given the existing design and the general
> > > sentiment
> > > > > on
> > > > > > this
> > > > > > > > > > > thread that the most efficient path from here is
> starting
> > > the
> > > > > > > > > > > implementation so that we can collectively iterate over
> > it.
> > > > > > > > > > >
> > > > > > > > > > > [1]
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-212%3A+Introduce+Flink+Kubernetes+Operator
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Jan 31, 2022 at 10:15 PM Thomas Weise <
> > > > t...@apache.org>
> > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > HI Xintong,
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks for the feedback and please see responses
> below
> > > -->
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, Jan 28, 2022 at 12:21 AM Xintong Song <
> > > > > > > > tonysong...@gmail.com
> > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Thanks Thomas for drafting this FLIP, and everyone
> > for
> > > > the
> > > > > > > > > > discussion.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I also have a few questions and comments.
> > > > > > > > > > > > >
> > > > > > > > > > > > > ## Job Submission
> > > > > > > > > > > > > Deploying a Flink session cluster via kubectl & CR
> > and
> > > > then
> > > > > > > > > > submitting
> > > > > > > > > > > > jobs
> > > > > > > > > > > > > to the cluster via Flink cli / REST is probably the
> > > > > approach
> > > > > > that
> > > > > > > > > > > > requires
> > > > > > > > > > > > > the least effort. However, I'd like to point out 2
> > > > > > weaknesses.
> > > > > > > > > > > > > 1. A lot of users use Flink in perjob/application
> > > modes.
> > > > > For
> > > > > > > > these
> > > > > > > > > > > users,
> > > > > > > > > > > > > having to run the job in two steps (deploy the
> > cluster,
> > > > and
> > > > > > > > submit
> > > > > > > > > > the
> > > > > > > > > > > > job)
> > > > > > > > > > > > > is not that convenient.
> > > > > > > > > > > > > 2. One of our motivations is being able to manage
> > Flink
> > > > > > > > > applications'
> > > > > > > > > > > > > lifecycles with kubectl. Submitting jobs from cli
> > > sounds
> > > > > not
> > > > > > > > > aligned
> > > > > > > > > > > with
> > > > > > > > > > > > > this motivation.
> > > > > > > > > > > > > I think it's probably worth it to support
> submitting
> > > jobs
> > > > > via
> > > > > > > > > > kubectl &
> > > > > > > > > > > > CR
> > > > > > > > > > > > > in the first version, both together with deploying
> > the
> > > > > > cluster
> > > > > > > > like
> > > > > > > > > > in
> > > > > > > > > > > > > perjob/application mode and after deploying the
> > cluster
> > > > > like
> > > > > > in
> > > > > > > > > > session
> > > > > > > > > > > > > mode.
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > The intention is to support application management
> > > through
> > > > > > operator
> > > > > > > > > and
> > > > > > > > > > > CR,
> > > > > > > > > > > > which means there won't be any 2 step submission
> > process,
> > > > > > which as
> > > > > > > > > you
> > > > > > > > > > > > allude to would defeat the purpose of this project.
> The
> > > CR
> > > > > > example
> > > > > > > > > > shows
> > > > > > > > > > > > the application part. Please note that the bare
> cluster
> > > > > > support is
> > > > > > > > an
> > > > > > > > > > > > *additional* feature for scenarios that require
> > external
> > > > job
> > > > > > > > > > management.
> > > > > > > > > > > Is
> > > > > > > > > > > > there anything on the FLIP page that creates a
> > different
> > > > > > > > impression?
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > ## Versioning
> > > > > > > > > > > > > Which Flink versions does the operator plan to
> > support?
> > > > > > > > > > > > > 1. Native K8s deployment was firstly introduced in
> > > Flink
> > > > > 1.10
> > > > > > > > > > > > > 2. Native K8s HA was introduced in Flink 1.12
> > > > > > > > > > > > > 3. The Pod template support was introduced in Flink
> > > 1.13
> > > > > > > > > > > > > 4. There was some changes to the Flink docker image
> > > > > > entrypoint
> > > > > > > > > script
> > > > > > > > > > > in,
> > > > > > > > > > > > > IIRC, Flink 1.13
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Great, thanks for providing this. It is important for
> > the
> > > > > > > > > compatibility
> > > > > > > > > > > > going forward also. We are targeting Flink 1.14.x
> > > upwards.
> > > > > > Before
> > > > > > > > the
> > > > > > > > > > > > operator is ready there will be another Flink
> release.
> > > > Let's
> > > > > > see if
> > > > > > > > > > > anyone
> > > > > > > > > > > > is interested in earlier versions?
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > ## Compatibility
> > > > > > > > > > > > > What kind of API compatibility we can commit to?
> It's
> > > > > > probably
> > > > > > > > fine
> > > > > > > > > > to
> > > > > > > > > > > > have
> > > > > > > > > > > > > alpha / beta version APIs that allow incompatible
> > > future
> > > > > > changes
> > > > > > > > > for
> > > > > > > > > > > the
> > > > > > > > > > > > > first version. But eventually we would need to
> > > guarantee
> > > > > > > > backwards
> > > > > > > > > > > > > compatibility, so that an early version CR can work
> > > with
> > > > a
> > > > > > new
> > > > > > > > > > version
> > > > > > > > > > > > > operator.
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Another great point and please let me include that on
> > the
> > > > > FLIP
> > > > > > > > page.
> > > > > > > > > > ;-)
> > > > > > > > > > > >
> > > > > > > > > > > > I think we should allow incompatible changes for the
> > > first
> > > > > one
> > > > > > or
> > > > > > > > two
> > > > > > > > > > > > versions, similar to how other major features have
> > > evolved
> > > > > > > > recently,
> > > > > > > > > > such
> > > > > > > > > > > > as FLIP-27.
> > > > > > > > > > > >
> > > > > > > > > > > > Would be great to get broader feedback on this one.
> > > > > > > > > > > >
> > > > > > > > > > > > Cheers,
> > > > > > > > > > > > Thomas
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > >
> > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Fri, Jan 28, 2022 at 1:18 PM Thomas Weise <
> > > > > t...@apache.org
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks for the feedback!
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > # 1 Flink Native vs Standalone integration
> > > > > > > > > > > > > > > Maybe we should make this more clear in the
> FLIP
> > > but
> > > > we
> > > > > > > > agreed
> > > > > > > > > to
> > > > > > > > > > > do
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > first version of the operator based on the
> native
> > > > > > > > integration.
> > > > > > > > > > > > > > > While this clearly does not cover all use-cases
> > and
> > > > > > > > > requirements,
> > > > > > > > > > > it
> > > > > > > > > > > > > > seems
> > > > > > > > > > > > > > > this would lead to a much smaller initial
> effort
> > > and
> > > > a
> > > > > > nicer
> > > > > > > > > > first
> > > > > > > > > > > > > > version.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I'm also leaning towards the native integration,
> as
> > > > long
> > > > > > as it
> > > > > > > > > > > reduces
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > MVP effort. Ultimately the operator will need to
> > also
> > > > > > support
> > > > > > > > the
> > > > > > > > > > > > > > standalone mode. I would like to gain more
> > confidence
> > > > > that
> > > > > > > > native
> > > > > > > > > > > > > > integration reduces the effort. While it cuts the
> > > > effort
> > > > > to
> > > > > > > > > handle
> > > > > > > > > > > the
> > > > > > > > > > > > TM
> > > > > > > > > > > > > > pod creation, some mapping code from the CR to
> the
> > > > native
> > > > > > > > > > integration
> > > > > > > > > > > > > > client and config needs to be created. As
> mentioned
> > > in
> > > > > the
> > > > > > > > FLIP,
> > > > > > > > > > > native
> > > > > > > > > > > > > > integration requires the Flink job manager to
> have
> > > > access
> > > > > > to
> > > > > > > > the
> > > > > > > > > > k8s
> > > > > > > > > > > > API
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > create pods, which in some scenarios may be seen
> as
> > > > > > > > unfavorable.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >  > > > # Pod Template
> > > > > > > > > > > > > > > > > Is the pod template in CR same with what
> > Flink
> > > > has
> > > > > > > > already
> > > > > > > > > > > > > > > supported[4]?
> > > > > > > > > > > > > > > > > Then I am afraid not the arbitrary
> field(e.g.
> > > > > > cpu/memory
> > > > > > > > > > > > resources)
> > > > > > > > > > > > > > > could
> > > > > > > > > > > > > > > > > take effect.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Yes, pod template would look almost identical.
> > There
> > > > are
> > > > > a
> > > > > > few
> > > > > > > > > > > settings
> > > > > > > > > > > > > > that the operator will control (and that may need
> > to
> > > be
> > > > > > > > > > blacklisted),
> > > > > > > > > > > > but
> > > > > > > > > > > > > > in general we would not want to place
> > restrictions. I
> > > > > > think a
> > > > > > > > > > > mechanism
> > > > > > > > > > > > > > where a pod template is merged from multiple
> layers
> > > > would
> > > > > > also
> > > > > > > > be
> > > > > > > > > > > > > > interesting to make this more flexible.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > Thomas
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to