Hi All! Thank you all for reviewing the PR and already helping to make it better. I have opened a bunch of jira tickets under https://issues.apache.org/jira/browse/FLINK-25963 based on some comments and incomplete features in general.
Given that there were no major objections about the prototype, I will merge it now so we can start collaborating together. Cheers, Gyula On Wed, Feb 16, 2022 at 3:52 AM Yang Wang <danrtsey...@gmail.com> wrote: > Thanks for the explanation. > Given that it is unrelated with java version in Flink. > Starting with java11 for the flink-kubernetes-operator makes sense to me. > > > Best, > Yang > > Thomas Weise <t...@apache.org> 于2022年2月15日周二 23:57写道: > > > Hi, > > > > At this point I see no reason to support Java 8 for a new project. > > Java 8 is being phased out, we should start with 11. > > > > Also, since the operator isn't a library but effectively just a docker > > image, the ability to change the Java version isn't as critical as it > > is for Flink core, which needs to run in many different environments. > > > > Cheers, > > Thomas > > > > On Tue, Feb 15, 2022 at 4:50 AM Gyula Fóra <gyula.f...@gmail.com> wrote: > > > > > > Hi Devs, > > > > > > Yang Wang discovered that the current prototype is not compatible with > > Java > > > 8 but only 11 and upwards. > > > > > > The reason for this is that the java operator SDK itself is not java 8 > > > compatible unfortunately. > > > > > > Given that Java 8 is on the road to deprecation and that the operator > > runs > > > as a containerized deployment, are there any concerns regarding making > > the > > > target java version 11? > > > This should not affect deployed flink clusters and jobs, those should > > still > > > work with Java 8, but only the kubernetes operator itself. > > > > > > Cheers, > > > Gyula > > > > > > > > > On Tue, Feb 15, 2022 at 1:06 PM Yang Wang <danrtsey...@gmail.com> > wrote: > > > > > > > I also lean to not introduce the savepoint/checkpoint related fields > > to the > > > > job spec, especially in the very beginning of > > flink-kubernetes-operator. > > > > > > > > > > > > Best, > > > > Yang > > > > > > > > Gyula Fóra <gyula.f...@gmail.com> 于2022年2月15日周二 19:02写道: > > > > > > > > > Hi Peng Yuan! > > > > > > > > > > While I do agree that savepoint path is a very important production > > > > > configuration there are a lot of other things that come to my mind: > > > > > - savepoint dir > > > > > - checkpoint dir > > > > > - checkpoint interval/timeout > > > > > - high availability settings (provider/storagedir etc) > > > > > > > > > > just to name a few... > > > > > > > > > > While these are all production critical, they have nice clean Flink > > > > config > > > > > settings to go with them. If we stand introducing these to jobspec > we > > > > only > > > > > get confusion about priority order etc and it is going to be hard > to > > > > change > > > > > or remove them in the future. In any case we should validate that > > these > > > > > configs exist in cases where users use a stateful upgrade mode for > > > > example. > > > > > This is something we need to add for sure. > > > > > > > > > > As for the other options you mentioned like automatic savepoint > > > > generation > > > > > for instance, those deserve an independent discussion of their own > I > > > > > believe :) > > > > > > > > > > Cheers, > > > > > Gyula > > > > > > > > > > On Tue, Feb 15, 2022 at 11:23 AM K Fred <yuanpengf...@gmail.com> > > wrote: > > > > > > > > > > > Hi Matyas! > > > > > > > > > > > > Thanks for your reply! > > > > > > For 1. and 3. scenarios,I couldn't agree more with the > podTemplate > > > > > solution > > > > > > , i missed this part. > > > > > > For savepoint related configuration, I think it's very important > > to be > > > > > > specified in JobSpec, Because savepoint is a very common > > configuration > > > > > for > > > > > > upgrading a job, if it has been placed in JobSpec can be > obviously > > > > > > configured by the user. In addition, other advanced properties > can > > be > > > > put > > > > > > into flinkConfiguration customized by expert users. > > > > > > A bunch of savepoint configuration as follows: > > > > > > > > > > > > > fromSavepoint——Job restart from > > > > > > > > > > > > autoSavepointSecond—— Automatically take a savepoint to the > > > > > `savepointsDir` > > > > > > > every n seconds. > > > > > > > > > > > > savepointsDir—— Savepoints dir where to store automatically taken > > > > > > > savepoints > > > > > > > > > > > > savepointGeneration—— Update savepoint generation of job status > > for a > > > > > > > running job (should be defined in JobStatus) > > > > > > > > > > > > > > > > > > Best wishes, > > > > > > Peng Yuan. > > > > > > > > > > > > On Tue, Feb 15, 2022 at 4:41 PM Őrhidi Mátyás < > > matyas.orh...@gmail.com > > > > > > > > > > > wrote: > > > > > > > > > > > > > Hi Peng, > > > > > > > > > > > > > > Thanks for your feedback. Regarding 1. and 3. scenarios, the > > > > > podTemplate > > > > > > > functionality in the operator could cover both. We also need to > > be > > > > > > careful > > > > > > > about introducing proxy parameters in the CRD spec. The > savepoint > > > > path > > > > > is > > > > > > > usually accompanied with a bunch of other configurations for > > example, > > > > > so > > > > > > > users need to use configuration params anyway. What do you > think? > > > > > > > > > > > > > > Best, > > > > > > > Matyas > > > > > > > > > > > > > > On Tue, Feb 15, 2022 at 8:58 AM K Fred <yuanpengf...@gmail.com > > > > > > wrote: > > > > > > > > > > > > > > > Hi Gyula! > > > > > > > > > > > > > > > > I have reviewed the prototype design of > > flink-kubernetes-operator > > > > you > > > > > > > > submitted, and I have the following questions: > > > > > > > > > > > > > > > > 1.Can a Flink Jar package that supports pulling from the > > sidecar be > > > > > > added > > > > > > > > to the JobSpec? just like this: > > > > > > > > > > > > > > > > > initContainers: > > > > > > > > > - name: downloader > > > > > > > > > image: curlimages/curl > > > > > > > > > env: > > > > > > > > > - name: JAR_URL > > > > > > > > > value: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://repo1.maven.org/maven2/org/apache/flink/flink-examples-streaming_2.12/1.14.3/flink-examples-streaming_2.12-1.14.3-WordCount.jar > > > > > > > > > - name: DEST_PATH > > > > > > > > > value: /cache/flink-app.jar > > > > > > > > > command: ['sh', '-c', 'curl -o ${DEST_PATH} > > ${JAR_URL}'] > > > > > > > > > > > > > > > > 2.Can we add savepoint path property to job specification? > > > > > > > > 3.Can we add an extra port to the JobManagerSpec and > > > > TaskManagerSpec > > > > > to > > > > > > > > expose some service ,such as prometheus?The property can be > > this: > > > > > > > > > > > > > > > > > extraPorts: > > > > > > > > > - name: prom > > > > > > > > > containerPort: 9249 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Best wishes, > > > > > > > > Peng Yuan > > > > > > > > > > > > > > > > On Tue, Feb 15, 2022 at 12:23 AM Gyula Fóra < > gyf...@apache.org > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Hi Flink Devs! > > > > > > > > > > > > > > > > > > We would like to present to you the first prototype of the > > > > > > > > > flink-kubernetes-operator that was built based on the FLIP > > and > > > > the > > > > > > > > > discussion on this mail thread. We would also like to call > > out > > > > some > > > > > > > > design > > > > > > > > > decisions that we have made regarding architecture > components > > > > that > > > > > > were > > > > > > > > not > > > > > > > > > explicitly mentioned in the FLIP document/thread and give > > you the > > > > > > > > > opportunity to raise any concerns here. > > > > > > > > > > > > > > > > > > You can find the initial prototype here: > > > > > > > > > https://github.com/apache/flink-kubernetes-operator/pull/1 > > > > > > > > > > > > > > > > > > We will leave the PR open for 1-2 days before merging to > let > > > > people > > > > > > > > comment > > > > > > > > > on it, but please be mindful that this is an initial > > prototype > > > > with > > > > > > > many > > > > > > > > > rough edges. It is not intended to be a complete > > implementation > > > > of > > > > > > the > > > > > > > > FLIP > > > > > > > > > specs as that will take some more work from all of us :) > > > > > > > > > > > > > > > > > > > > > > > > > > > *Prototype feature set:*The prototype contains a basic > > working > > > > > > version > > > > > > > of > > > > > > > > > the flink-kubernetes-operator that supports deployment and > > > > > lifecycle > > > > > > > > > management of a stateful native flink application. We have > > basic > > > > > > > support > > > > > > > > > for stateful and stateless upgrades, UI ingress, pod > > templates > > > > etc. > > > > > > > Error > > > > > > > > > handling at this point is largely missing. > > > > > > > > > > > > > > > > > > > > > > > > > > > *Features / design decisions that were not explicitly > > discussed > > > > in > > > > > > this > > > > > > > > > thread* > > > > > > > > > > > > > > > > > > *Basic Admission control using a Webhook*Standard resource > > > > > admission > > > > > > > > > control in Kubernetes to validate and potentially reject > > > > resources > > > > > is > > > > > > > > done > > > > > > > > > through Webhooks. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/ > > > > > > > > > This is a necessary mechanism to give the user an upfront > > error > > > > > when > > > > > > an > > > > > > > > > incorrect resource was submitted. In the Flink operator's > > case we > > > > > > need > > > > > > > to > > > > > > > > > validate that the FlinkDeployment yaml actually makes sense > > and > > > > > does > > > > > > > not > > > > > > > > > contain erroneous config options that would inevitably lead > > to > > > > > > > > > deployment/job failures. > > > > > > > > > > > > > > > > > > We have implemented a simple webhook that we can use for > this > > > > type > > > > > of > > > > > > > > > validation, as a separate maven module > > > > (flink-kubernetes-webhook). > > > > > > The > > > > > > > > > webhook is an optional component and can be enabled or > > disabled > > > > > > during > > > > > > > > > deployment. To avoid pulling in new external dependencies > we > > have > > > > > > used > > > > > > > > the > > > > > > > > > Flink Shaded Netty module to build the simple rest endpoint > > > > > required. > > > > > > > If > > > > > > > > > the community feels that Netty adds unnecessary complexity > > to the > > > > > > > webhook > > > > > > > > > implementation we are open to alternative backends such as > > > > > Springboot > > > > > > > for > > > > > > > > > instance which would practically eliminate all the > > boilerplate. > > > > > > > > > > > > > > > > > > > > > > > > > > > *Helm Chart for deployment*Helm charts provide an industry > > > > standard > > > > > > way > > > > > > > > of > > > > > > > > > managing kubernetes deployments. We have created a helm > chart > > > > > > prototype > > > > > > > > > that can be used to deploy the operator together with all > > > > required > > > > > > > > > resources. The helm chart allows easy configuration for > > things > > > > like > > > > > > > > images, > > > > > > > > > namespaces etc and flags to control specific parts of the > > > > > deployment > > > > > > > such > > > > > > > > > as RBAC or the webhook. > > > > > > > > > > > > > > > > > > The helm chart provided is intended to be a first version > > that > > > > > worked > > > > > > > for > > > > > > > > > us during development but we expect to have a lot of > > iterations > > > > on > > > > > it > > > > > > > > based > > > > > > > > > on the feedback from the community. > > > > > > > > > > > > > > > > > > *Acknowledgment* > > > > > > > > > We would like to thank everyone who has provided support > and > > > > > valuable > > > > > > > > > feedback on this FLIP. > > > > > > > > > We would also like to thank Yang Wang & Alexis > Sarda-Espinosa > > > > > > > > specifically > > > > > > > > > for making their operators open source and available to us > > which > > > > > had > > > > > > a > > > > > > > > big > > > > > > > > > impact on the FLIP and the prototype. > > > > > > > > > > > > > > > > > > We are looking forward to continuing development on the > > operator > > > > > > > together > > > > > > > > > with the broader community. > > > > > > > > > All work will be tracked using the ASF Jira from now on. > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > Gyula > > > > > > > > > > > > > > > > > > On Mon, Feb 14, 2022 at 9:21 AM K Fred < > > yuanpengf...@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Hi Gyula, > > > > > > > > > > > > > > > > > > > > Thanks! > > > > > > > > > > It's great to see the project getting started and I can't > > wait > > > > to > > > > > > see > > > > > > > > the > > > > > > > > > > PR and start contributing code.😄😄😄 > > > > > > > > > > > > > > > > > > > > Best Wishes! > > > > > > > > > > Peng Yuan > > > > > > > > > > > > > > > > > > > > On Mon, Feb 14, 2022 at 4:14 PM Gyula Fóra < > > > > gyula.f...@gmail.com > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Hi Peng Yuan! > > > > > > > > > > > > > > > > > > > > > > The repo is already created: > > > > > > > > > > > https://github.com/apache/flink-kubernetes-operator > > > > > > > > > > > > > > > > > > > > > > We will open the PR with the initial prototype later > > today, > > > > > stay > > > > > > > > tuned > > > > > > > > > in > > > > > > > > > > > this thread! :) > > > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > Gyula > > > > > > > > > > > > > > > > > > > > > > On Mon, Feb 14, 2022 at 9:09 AM K Fred < > > > > yuanpengf...@gmail.com > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > Hi All, > > > > > > > > > > > > > > > > > > > > > > > > Has the project of flink-kubernetes-operator been > > created > > > > in > > > > > > > > github? > > > > > > > > > > > > > > > > > > > > > > > > Peng Yuan > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Feb 9, 2022 at 1:23 AM Gyula Fóra < > > > > > > gyula.f...@gmail.com> > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > I agree with flink-kubernetes-operator as the repo > > name > > > > :) > > > > > > > > > > > > > Don't have any better idea > > > > > > > > > > > > > > > > > > > > > > > > > > Gyula > > > > > > > > > > > > > > > > > > > > > > > > > > On Sat, Feb 5, 2022 at 2:41 AM Thomas Weise < > > > > > t...@apache.org> > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for the continued feedback and discussion. > > Looks > > > > > > like > > > > > > > we > > > > > > > > > are > > > > > > > > > > > > > > ready to start a VOTE, I will initiate it > shortly. > > > > > > > > > > > > > > > > > > > > > > > > > > > > In parallel it would be good to find the > repository > > > > name. > > > > > > > > > > > > > > > > > > > > > > > > > > > > My suggestion would be: flink-kubernetes-operator > > > > > > > > > > > > > > > > > > > > > > > > > > > > I thought "flink-operator" could be a bit > > misleading > > > > > since > > > > > > > the > > > > > > > > > term > > > > > > > > > > > > > > operator already has a meaning in Flink. > > > > > > > > > > > > > > > > > > > > > > > > > > > > I also considered "flink-k8s-operator" but that > > would > > > > be > > > > > > > almost > > > > > > > > > > > > > > identical to existing operator implementations > and > > > > could > > > > > > lead > > > > > > > > to > > > > > > > > > > > > > > confusion in the future. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thoughts? > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > Thomas > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Feb 4, 2022 at 5:15 AM Gyula Fóra < > > > > > > > > gyula.f...@gmail.com> > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Danny, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > So far we have been focusing our dev efforts on > > the > > > > > > initial > > > > > > > > > > native > > > > > > > > > > > > > > > implementation with the team. > > > > > > > > > > > > > > > If the discussion and vote goes well for this > > FLIP we > > > > > are > > > > > > > > > looking > > > > > > > > > > > > > forward > > > > > > > > > > > > > > > to contributing the initial version sometime > next > > > > week > > > > > > > > (fingers > > > > > > > > > > > > > crossed). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > At that point I think we can already start the > > dev > > > > work > > > > > > to > > > > > > > > > > support > > > > > > > > > > > > the > > > > > > > > > > > > > > > standalone mode as well, especially if you can > > > > dedicate > > > > > > > some > > > > > > > > > > effort > > > > > > > > > > > > to > > > > > > > > > > > > > > > pushing that side. > > > > > > > > > > > > > > > Working together on this sounds like a great > > idea and > > > > > we > > > > > > > > should > > > > > > > > > > > start > > > > > > > > > > > > > as > > > > > > > > > > > > > > > soon as possible! :) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > > > > > Gyula > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Feb 4, 2022 at 2:07 PM Danny Cranmer < > > > > > > > > > > > > dannycran...@apache.org> > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I have been discussing this one with my team. > > We > > > > are > > > > > > > > > interested > > > > > > > > > > > in > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > Standalone mode, and are willing to > contribute > > > > > towards > > > > > > > the > > > > > > > > > > > > > > implementation. > > > > > > > > > > > > > > > > Potentially we can work together to support > > both > > > > > modes > > > > > > in > > > > > > > > > > > parallel? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Feb 2, 2022 at 4:02 PM Gyula Fóra < > > > > > > > > > > gyula.f...@gmail.com> > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Danny! > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for the feedback :) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Versioning: > > > > > > > > > > > > > > > > > Versioning will be independent from Flink > > and the > > > > > > > > operator > > > > > > > > > > will > > > > > > > > > > > > > > depend > > > > > > > > > > > > > > > > on a > > > > > > > > > > > > > > > > > fixed flink version (in every given > operator > > > > > > version). > > > > > > > > > > > > > > > > > This should be the exact same setup as with > > > > > Stateful > > > > > > > > > > Functions > > > > > > > > > > > ( > > > > > > > > > > > > > > > > > https://github.com/apache/flink-statefun). > > So > > > > > > > > independent > > > > > > > > > > > > release > > > > > > > > > > > > > > cycle > > > > > > > > > > > > > > > > > but > > > > > > > > > > > > > > > > > still within the Flink umbrella. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Deployment error handling: > > > > > > > > > > > > > > > > > I think that's a very good point, as > general > > > > > > exception > > > > > > > > > > handling > > > > > > > > > > > > for > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > different failure scenarios is a tricky > > problem. > > > > I > > > > > > > think > > > > > > > > > the > > > > > > > > > > > > > > exception > > > > > > > > > > > > > > > > > classifiers and retry strategies could > avoid > > a > > > > lot > > > > > of > > > > > > > > > manual > > > > > > > > > > > > > > intervention > > > > > > > > > > > > > > > > > from the user. We will definitely need to > add > > > > > > something > > > > > > > > > like > > > > > > > > > > > > this. > > > > > > > > > > > > > > Once > > > > > > > > > > > > > > > > we > > > > > > > > > > > > > > > > > have the repo created with the initial > > operator > > > > > code > > > > > > we > > > > > > > > > > should > > > > > > > > > > > > open > > > > > > > > > > > > > > some > > > > > > > > > > > > > > > > > tickets for this and put it on the short > term > > > > > > roadmap! > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > > > > > > > Gyula > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Feb 2, 2022 at 4:50 PM Danny > Cranmer > > < > > > > > > > > > > > > > > dannycran...@apache.org> > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hey team, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Great work on the FLIP, I am looking > > forward to > > > > > > this > > > > > > > > > one. I > > > > > > > > > > > > agree > > > > > > > > > > > > > > that > > > > > > > > > > > > > > > > we > > > > > > > > > > > > > > > > > > can move forward to the voting stage. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I have general feedback around how we > will > > > > handle > > > > > > job > > > > > > > > > > > > submission > > > > > > > > > > > > > > > > failure > > > > > > > > > > > > > > > > > > and retry. As discussed in the Rejected > > > > > > Alternatives > > > > > > > > > > section, > > > > > > > > > > > > we > > > > > > > > > > > > > > can > > > > > > > > > > > > > > > > use > > > > > > > > > > > > > > > > > > Java to handle job submission failures > > from the > > > > > > Flink > > > > > > > > > > client. > > > > > > > > > > > > It > > > > > > > > > > > > > > would > > > > > > > > > > > > > > > > be > > > > > > > > > > > > > > > > > > useful to have the ability to configure > > > > exception > > > > > > > > > > classifiers > > > > > > > > > > > > and > > > > > > > > > > > > > > retry > > > > > > > > > > > > > > > > > > strategy as part of operator > configuration. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Given this will be in a separate Github > > > > > repository > > > > > > I > > > > > > > am > > > > > > > > > > > curious > > > > > > > > > > > > > how > > > > > > > > > > > > > > > > ther > > > > > > > > > > > > > > > > > > versioning strategy will work in relation > > to > > > > the > > > > > > > Flink > > > > > > > > > > > version? > > > > > > > > > > > > > Do > > > > > > > > > > > > > > we > > > > > > > > > > > > > > > > > have > > > > > > > > > > > > > > > > > > any other components with a similar setup > > I can > > > > > > look > > > > > > > > at? > > > > > > > > > > Will > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > operator > > > > > > > > > > > > > > > > > > version track Flink or will it use its > own > > > > > > versioning > > > > > > > > > > > strategy > > > > > > > > > > > > > > with a > > > > > > > > > > > > > > > > > Flink > > > > > > > > > > > > > > > > > > version support matrix, or similar? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Feb 1, 2022 at 2:33 PM Márton > > Balassi < > > > > > > > > > > > > > > > > balassi.mar...@gmail.com> > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi team, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thank you for the great feedback, > Thomas > > has > > > > > > > updated > > > > > > > > > the > > > > > > > > > > > FLIP > > > > > > > > > > > > > > page > > > > > > > > > > > > > > > > > > > accordingly. If you are comfortable > with > > the > > > > > > > > currently > > > > > > > > > > > > existing > > > > > > > > > > > > > > > > design > > > > > > > > > > > > > > > > > > and > > > > > > > > > > > > > > > > > > > depth in the FLIP [1] I suggest moving > > > > forward > > > > > to > > > > > > > the > > > > > > > > > > > voting > > > > > > > > > > > > > > stage - > > > > > > > > > > > > > > > > > once > > > > > > > > > > > > > > > > > > > that reaches a positive conclusion it > > lets us > > > > > > > create > > > > > > > > > the > > > > > > > > > > > > > separate > > > > > > > > > > > > > > > > code > > > > > > > > > > > > > > > > > > > repository under the flink project for > > the > > > > > > > operator. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I encourage everyone to keep improving > > the > > > > > > details > > > > > > > in > > > > > > > > > the > > > > > > > > > > > > > > meantime, > > > > > > > > > > > > > > > > > > however > > > > > > > > > > > > > > > > > > > I believe given the existing design and > > the > > > > > > general > > > > > > > > > > > sentiment > > > > > > > > > > > > > on > > > > > > > > > > > > > > this > > > > > > > > > > > > > > > > > > > thread that the most efficient path > from > > here > > > > > is > > > > > > > > > starting > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > implementation so that we can > > collectively > > > > > > iterate > > > > > > > > over > > > > > > > > > > it. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-212%3A+Introduce+Flink+Kubernetes+Operator > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Jan 31, 2022 at 10:15 PM Thomas > > > > Weise < > > > > > > > > > > > > t...@apache.org> > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > HI Xintong, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for the feedback and please > see > > > > > > responses > > > > > > > > > below > > > > > > > > > > > --> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Jan 28, 2022 at 12:21 AM > > Xintong > > > > > Song < > > > > > > > > > > > > > > > > tonysong...@gmail.com > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks Thomas for drafting this > > FLIP, and > > > > > > > > everyone > > > > > > > > > > for > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > discussion. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I also have a few questions and > > comments. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ## Job Submission > > > > > > > > > > > > > > > > > > > > > Deploying a Flink session cluster > via > > > > > > kubectl & > > > > > > > > CR > > > > > > > > > > and > > > > > > > > > > > > then > > > > > > > > > > > > > > > > > > submitting > > > > > > > > > > > > > > > > > > > > jobs > > > > > > > > > > > > > > > > > > > > > to the cluster via Flink cli / REST > > is > > > > > > probably > > > > > > > > the > > > > > > > > > > > > > approach > > > > > > > > > > > > > > that > > > > > > > > > > > > > > > > > > > > requires > > > > > > > > > > > > > > > > > > > > > the least effort. However, I'd like > > to > > > > > point > > > > > > > out > > > > > > > > 2 > > > > > > > > > > > > > > weaknesses. > > > > > > > > > > > > > > > > > > > > > 1. A lot of users use Flink in > > > > > > > perjob/application > > > > > > > > > > > modes. > > > > > > > > > > > > > For > > > > > > > > > > > > > > > > these > > > > > > > > > > > > > > > > > > > users, > > > > > > > > > > > > > > > > > > > > > having to run the job in two steps > > > > (deploy > > > > > > the > > > > > > > > > > cluster, > > > > > > > > > > > > and > > > > > > > > > > > > > > > > submit > > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > > job) > > > > > > > > > > > > > > > > > > > > > is not that convenient. > > > > > > > > > > > > > > > > > > > > > 2. One of our motivations is being > > able > > > > to > > > > > > > manage > > > > > > > > > > Flink > > > > > > > > > > > > > > > > > applications' > > > > > > > > > > > > > > > > > > > > > lifecycles with kubectl. Submitting > > jobs > > > > > from > > > > > > > cli > > > > > > > > > > > sounds > > > > > > > > > > > > > not > > > > > > > > > > > > > > > > > aligned > > > > > > > > > > > > > > > > > > > with > > > > > > > > > > > > > > > > > > > > > this motivation. > > > > > > > > > > > > > > > > > > > > > I think it's probably worth it to > > support > > > > > > > > > submitting > > > > > > > > > > > jobs > > > > > > > > > > > > > via > > > > > > > > > > > > > > > > > > kubectl & > > > > > > > > > > > > > > > > > > > > CR > > > > > > > > > > > > > > > > > > > > > in the first version, both together > > with > > > > > > > > deploying > > > > > > > > > > the > > > > > > > > > > > > > > cluster > > > > > > > > > > > > > > > > like > > > > > > > > > > > > > > > > > > in > > > > > > > > > > > > > > > > > > > > > perjob/application mode and after > > > > deploying > > > > > > the > > > > > > > > > > cluster > > > > > > > > > > > > > like > > > > > > > > > > > > > > in > > > > > > > > > > > > > > > > > > session > > > > > > > > > > > > > > > > > > > > > mode. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The intention is to support > application > > > > > > > management > > > > > > > > > > > through > > > > > > > > > > > > > > operator > > > > > > > > > > > > > > > > > and > > > > > > > > > > > > > > > > > > > CR, > > > > > > > > > > > > > > > > > > > > which means there won't be any 2 step > > > > > > submission > > > > > > > > > > process, > > > > > > > > > > > > > > which as > > > > > > > > > > > > > > > > > you > > > > > > > > > > > > > > > > > > > > allude to would defeat the purpose of > > this > > > > > > > project. > > > > > > > > > The > > > > > > > > > > > CR > > > > > > > > > > > > > > example > > > > > > > > > > > > > > > > > > shows > > > > > > > > > > > > > > > > > > > > the application part. Please note > that > > the > > > > > bare > > > > > > > > > cluster > > > > > > > > > > > > > > support is > > > > > > > > > > > > > > > > an > > > > > > > > > > > > > > > > > > > > *additional* feature for scenarios > that > > > > > require > > > > > > > > > > external > > > > > > > > > > > > job > > > > > > > > > > > > > > > > > > management. > > > > > > > > > > > > > > > > > > > Is > > > > > > > > > > > > > > > > > > > > there anything on the FLIP page that > > > > creates > > > > > a > > > > > > > > > > different > > > > > > > > > > > > > > > > impression? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ## Versioning > > > > > > > > > > > > > > > > > > > > > Which Flink versions does the > > operator > > > > plan > > > > > > to > > > > > > > > > > support? > > > > > > > > > > > > > > > > > > > > > 1. Native K8s deployment was > firstly > > > > > > introduced > > > > > > > > in > > > > > > > > > > > Flink > > > > > > > > > > > > > 1.10 > > > > > > > > > > > > > > > > > > > > > 2. Native K8s HA was introduced in > > Flink > > > > > 1.12 > > > > > > > > > > > > > > > > > > > > > 3. The Pod template support was > > > > introduced > > > > > in > > > > > > > > Flink > > > > > > > > > > > 1.13 > > > > > > > > > > > > > > > > > > > > > 4. There was some changes to the > > Flink > > > > > docker > > > > > > > > image > > > > > > > > > > > > > > entrypoint > > > > > > > > > > > > > > > > > script > > > > > > > > > > > > > > > > > > > in, > > > > > > > > > > > > > > > > > > > > > IIRC, Flink 1.13 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Great, thanks for providing this. It > is > > > > > > important > > > > > > > > for > > > > > > > > > > the > > > > > > > > > > > > > > > > > compatibility > > > > > > > > > > > > > > > > > > > > going forward also. We are targeting > > Flink > > > > > > 1.14.x > > > > > > > > > > > upwards. > > > > > > > > > > > > > > Before > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > > operator is ready there will be > another > > > > Flink > > > > > > > > > release. > > > > > > > > > > > > Let's > > > > > > > > > > > > > > see if > > > > > > > > > > > > > > > > > > > anyone > > > > > > > > > > > > > > > > > > > > is interested in earlier versions? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ## Compatibility > > > > > > > > > > > > > > > > > > > > > What kind of API compatibility we > can > > > > > commit > > > > > > > to? > > > > > > > > > It's > > > > > > > > > > > > > > probably > > > > > > > > > > > > > > > > fine > > > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > > > have > > > > > > > > > > > > > > > > > > > > > alpha / beta version APIs that > allow > > > > > > > incompatible > > > > > > > > > > > future > > > > > > > > > > > > > > changes > > > > > > > > > > > > > > > > > for > > > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > > > first version. But eventually we > > would > > > > need > > > > > > to > > > > > > > > > > > guarantee > > > > > > > > > > > > > > > > backwards > > > > > > > > > > > > > > > > > > > > > compatibility, so that an early > > version > > > > CR > > > > > > can > > > > > > > > work > > > > > > > > > > > with > > > > > > > > > > > > a > > > > > > > > > > > > > > new > > > > > > > > > > > > > > > > > > version > > > > > > > > > > > > > > > > > > > > > operator. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Another great point and please let me > > > > include > > > > > > > that > > > > > > > > on > > > > > > > > > > the > > > > > > > > > > > > > FLIP > > > > > > > > > > > > > > > > page. > > > > > > > > > > > > > > > > > > ;-) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I think we should allow incompatible > > > > changes > > > > > > for > > > > > > > > the > > > > > > > > > > > first > > > > > > > > > > > > > one > > > > > > > > > > > > > > or > > > > > > > > > > > > > > > > two > > > > > > > > > > > > > > > > > > > > versions, similar to how other major > > > > features > > > > > > > have > > > > > > > > > > > evolved > > > > > > > > > > > > > > > > recently, > > > > > > > > > > > > > > > > > > such > > > > > > > > > > > > > > > > > > > > as FLIP-27. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Would be great to get broader > feedback > > on > > > > > this > > > > > > > one. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > > > > > > > > > > Thomas > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thank you~ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Xintong Song > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Jan 28, 2022 at 1:18 PM > > Thomas > > > > > Weise > > > > > > < > > > > > > > > > > > > > t...@apache.org > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for the feedback! > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > # 1 Flink Native vs Standalone > > > > > > integration > > > > > > > > > > > > > > > > > > > > > > > Maybe we should make this more > > clear > > > > in > > > > > > the > > > > > > > > > FLIP > > > > > > > > > > > but > > > > > > > > > > > > we > > > > > > > > > > > > > > > > agreed > > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > > do > > > > > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > > > > > first version of the operator > > based > > > > on > > > > > > the > > > > > > > > > native > > > > > > > > > > > > > > > > integration. > > > > > > > > > > > > > > > > > > > > > > > While this clearly does not > > cover all > > > > > > > > use-cases > > > > > > > > > > and > > > > > > > > > > > > > > > > > requirements, > > > > > > > > > > > > > > > > > > > it > > > > > > > > > > > > > > > > > > > > > > seems > > > > > > > > > > > > > > > > > > > > > > > this would lead to a much > smaller > > > > > initial > > > > > > > > > effort > > > > > > > > > > > and > > > > > > > > > > > > a > > > > > > > > > > > > > > nicer > > > > > > > > > > > > > > > > > > first > > > > > > > > > > > > > > > > > > > > > > version. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I'm also leaning towards the > native > > > > > > > > integration, > > > > > > > > > as > > > > > > > > > > > > long > > > > > > > > > > > > > > as it > > > > > > > > > > > > > > > > > > > reduces > > > > > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > > > > MVP effort. Ultimately the > operator > > > > will > > > > > > need > > > > > > > > to > > > > > > > > > > also > > > > > > > > > > > > > > support > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > > > > standalone mode. I would like to > > gain > > > > > more > > > > > > > > > > confidence > > > > > > > > > > > > > that > > > > > > > > > > > > > > > > native > > > > > > > > > > > > > > > > > > > > > > integration reduces the effort. > > While > > > > it > > > > > > cuts > > > > > > > > the > > > > > > > > > > > > effort > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > handle > > > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > > TM > > > > > > > > > > > > > > > > > > > > > > pod creation, some mapping code > > from > > > > the > > > > > CR > > > > > > > to > > > > > > > > > the > > > > > > > > > > > > native > > > > > > > > > > > > > > > > > > integration > > > > > > > > > > > > > > > > > > > > > > client and config needs to be > > created. > > > > As > > > > > > > > > mentioned > > > > > > > > > > > in > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > FLIP, > > > > > > > > > > > > > > > > > > > native > > > > > > > > > > > > > > > > > > > > > > integration requires the Flink > job > > > > > manager > > > > > > to > > > > > > > > > have > > > > > > > > > > > > access > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > k8s > > > > > > > > > > > > > > > > > > > > API > > > > > > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > > > > > create pods, which in some > > scenarios > > > > may > > > > > be > > > > > > > > seen > > > > > > > > > as > > > > > > > > > > > > > > > > unfavorable. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > # Pod Template > > > > > > > > > > > > > > > > > > > > > > > > > Is the pod template in CR > > same > > > > with > > > > > > > what > > > > > > > > > > Flink > > > > > > > > > > > > has > > > > > > > > > > > > > > > > already > > > > > > > > > > > > > > > > > > > > > > > supported[4]? > > > > > > > > > > > > > > > > > > > > > > > > > Then I am afraid not the > > > > arbitrary > > > > > > > > > field(e.g. > > > > > > > > > > > > > > cpu/memory > > > > > > > > > > > > > > > > > > > > resources) > > > > > > > > > > > > > > > > > > > > > > > could > > > > > > > > > > > > > > > > > > > > > > > > > take effect. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Yes, pod template would look > almost > > > > > > > identical. > > > > > > > > > > There > > > > > > > > > > > > are > > > > > > > > > > > > > a > > > > > > > > > > > > > > few > > > > > > > > > > > > > > > > > > > settings > > > > > > > > > > > > > > > > > > > > > > that the operator will control > (and > > > > that > > > > > > may > > > > > > > > need > > > > > > > > > > to > > > > > > > > > > > be > > > > > > > > > > > > > > > > > > blacklisted), > > > > > > > > > > > > > > > > > > > > but > > > > > > > > > > > > > > > > > > > > > > in general we would not want to > > place > > > > > > > > > > restrictions. I > > > > > > > > > > > > > > think a > > > > > > > > > > > > > > > > > > > mechanism > > > > > > > > > > > > > > > > > > > > > > where a pod template is merged > from > > > > > > multiple > > > > > > > > > layers > > > > > > > > > > > > would > > > > > > > > > > > > > > also > > > > > > > > > > > > > > > > be > > > > > > > > > > > > > > > > > > > > > > interesting to make this more > > flexible. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > > > > > > > > > > > > Thomas > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >