Hi Peng Yuan! The repo is already created: https://github.com/apache/flink-kubernetes-operator
We will open the PR with the initial prototype later today, stay tuned in this thread! :) Cheers, Gyula On Mon, Feb 14, 2022 at 9:09 AM K Fred <yuanpengf...@gmail.com> wrote: > Hi All, > > Has the project of flink-kubernetes-operator been created in github? > > Peng Yuan > > On Wed, Feb 9, 2022 at 1:23 AM Gyula Fóra <gyula.f...@gmail.com> wrote: > > > I agree with flink-kubernetes-operator as the repo name :) > > Don't have any better idea > > > > Gyula > > > > On Sat, Feb 5, 2022 at 2:41 AM Thomas Weise <t...@apache.org> wrote: > > > > > Hi, > > > > > > Thanks for the continued feedback and discussion. Looks like we are > > > ready to start a VOTE, I will initiate it shortly. > > > > > > In parallel it would be good to find the repository name. > > > > > > My suggestion would be: flink-kubernetes-operator > > > > > > I thought "flink-operator" could be a bit misleading since the term > > > operator already has a meaning in Flink. > > > > > > I also considered "flink-k8s-operator" but that would be almost > > > identical to existing operator implementations and could lead to > > > confusion in the future. > > > > > > Thoughts? > > > > > > Thanks, > > > Thomas > > > > > > > > > > > > On Fri, Feb 4, 2022 at 5:15 AM Gyula Fóra <gyula.f...@gmail.com> > wrote: > > > > > > > > Hi Danny, > > > > > > > > So far we have been focusing our dev efforts on the initial native > > > > implementation with the team. > > > > If the discussion and vote goes well for this FLIP we are looking > > forward > > > > to contributing the initial version sometime next week (fingers > > crossed). > > > > > > > > At that point I think we can already start the dev work to support > the > > > > standalone mode as well, especially if you can dedicate some effort > to > > > > pushing that side. > > > > Working together on this sounds like a great idea and we should start > > as > > > > soon as possible! :) > > > > > > > > Cheers, > > > > Gyula > > > > > > > > On Fri, Feb 4, 2022 at 2:07 PM Danny Cranmer < > dannycran...@apache.org> > > > > wrote: > > > > > > > > > I have been discussing this one with my team. We are interested in > > the > > > > > Standalone mode, and are willing to contribute towards the > > > implementation. > > > > > Potentially we can work together to support both modes in parallel? > > > > > > > > > > Thanks, > > > > > > > > > > On Wed, Feb 2, 2022 at 4:02 PM Gyula Fóra <gyula.f...@gmail.com> > > > wrote: > > > > > > > > > > > Hi Danny! > > > > > > > > > > > > Thanks for the feedback :) > > > > > > > > > > > > Versioning: > > > > > > Versioning will be independent from Flink and the operator will > > > depend > > > > > on a > > > > > > fixed flink version (in every given operator version). > > > > > > This should be the exact same setup as with Stateful Functions ( > > > > > > https://github.com/apache/flink-statefun). So independent > release > > > cycle > > > > > > but > > > > > > still within the Flink umbrella. > > > > > > > > > > > > Deployment error handling: > > > > > > I think that's a very good point, as general exception handling > for > > > the > > > > > > different failure scenarios is a tricky problem. I think the > > > exception > > > > > > classifiers and retry strategies could avoid a lot of manual > > > intervention > > > > > > from the user. We will definitely need to add something like > this. > > > Once > > > > > we > > > > > > have the repo created with the initial operator code we should > open > > > some > > > > > > tickets for this and put it on the short term roadmap! > > > > > > > > > > > > Cheers, > > > > > > Gyula > > > > > > > > > > > > On Wed, Feb 2, 2022 at 4:50 PM Danny Cranmer < > > > dannycran...@apache.org> > > > > > > wrote: > > > > > > > > > > > > > Hey team, > > > > > > > > > > > > > > Great work on the FLIP, I am looking forward to this one. I > agree > > > that > > > > > we > > > > > > > can move forward to the voting stage. > > > > > > > > > > > > > > I have general feedback around how we will handle job > submission > > > > > failure > > > > > > > and retry. As discussed in the Rejected Alternatives section, > we > > > can > > > > > use > > > > > > > Java to handle job submission failures from the Flink client. > It > > > would > > > > > be > > > > > > > useful to have the ability to configure exception classifiers > and > > > retry > > > > > > > strategy as part of operator configuration. > > > > > > > > > > > > > > Given this will be in a separate Github repository I am curious > > how > > > > > ther > > > > > > > versioning strategy will work in relation to the Flink version? > > Do > > > we > > > > > > have > > > > > > > any other components with a similar setup I can look at? Will > the > > > > > > operator > > > > > > > version track Flink or will it use its own versioning strategy > > > with a > > > > > > Flink > > > > > > > version support matrix, or similar? > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Feb 1, 2022 at 2:33 PM Márton Balassi < > > > > > balassi.mar...@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > > > Hi team, > > > > > > > > > > > > > > > > Thank you for the great feedback, Thomas has updated the FLIP > > > page > > > > > > > > accordingly. If you are comfortable with the currently > existing > > > > > design > > > > > > > and > > > > > > > > depth in the FLIP [1] I suggest moving forward to the voting > > > stage - > > > > > > once > > > > > > > > that reaches a positive conclusion it lets us create the > > separate > > > > > code > > > > > > > > repository under the flink project for the operator. > > > > > > > > > > > > > > > > I encourage everyone to keep improving the details in the > > > meantime, > > > > > > > however > > > > > > > > I believe given the existing design and the general sentiment > > on > > > this > > > > > > > > thread that the most efficient path from here is starting the > > > > > > > > implementation so that we can collectively iterate over it. > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-212%3A+Introduce+Flink+Kubernetes+Operator > > > > > > > > > > > > > > > > On Mon, Jan 31, 2022 at 10:15 PM Thomas Weise < > t...@apache.org> > > > > > wrote: > > > > > > > > > > > > > > > > > HI Xintong, > > > > > > > > > > > > > > > > > > Thanks for the feedback and please see responses below --> > > > > > > > > > > > > > > > > > > On Fri, Jan 28, 2022 at 12:21 AM Xintong Song < > > > > > tonysong...@gmail.com > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Thanks Thomas for drafting this FLIP, and everyone for > the > > > > > > > discussion. > > > > > > > > > > > > > > > > > > > > I also have a few questions and comments. > > > > > > > > > > > > > > > > > > > > ## Job Submission > > > > > > > > > > Deploying a Flink session cluster via kubectl & CR and > then > > > > > > > submitting > > > > > > > > > jobs > > > > > > > > > > to the cluster via Flink cli / REST is probably the > > approach > > > that > > > > > > > > > requires > > > > > > > > > > the least effort. However, I'd like to point out 2 > > > weaknesses. > > > > > > > > > > 1. A lot of users use Flink in perjob/application modes. > > For > > > > > these > > > > > > > > users, > > > > > > > > > > having to run the job in two steps (deploy the cluster, > and > > > > > submit > > > > > > > the > > > > > > > > > job) > > > > > > > > > > is not that convenient. > > > > > > > > > > 2. One of our motivations is being able to manage Flink > > > > > > applications' > > > > > > > > > > lifecycles with kubectl. Submitting jobs from cli sounds > > not > > > > > > aligned > > > > > > > > with > > > > > > > > > > this motivation. > > > > > > > > > > I think it's probably worth it to support submitting jobs > > via > > > > > > > kubectl & > > > > > > > > > CR > > > > > > > > > > in the first version, both together with deploying the > > > cluster > > > > > like > > > > > > > in > > > > > > > > > > perjob/application mode and after deploying the cluster > > like > > > in > > > > > > > session > > > > > > > > > > mode. > > > > > > > > > > > > > > > > > > > > > > > > > > > > The intention is to support application management through > > > operator > > > > > > and > > > > > > > > CR, > > > > > > > > > which means there won't be any 2 step submission process, > > > which as > > > > > > you > > > > > > > > > allude to would defeat the purpose of this project. The CR > > > example > > > > > > > shows > > > > > > > > > the application part. Please note that the bare cluster > > > support is > > > > > an > > > > > > > > > *additional* feature for scenarios that require external > job > > > > > > > management. > > > > > > > > Is > > > > > > > > > there anything on the FLIP page that creates a different > > > > > impression? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ## Versioning > > > > > > > > > > Which Flink versions does the operator plan to support? > > > > > > > > > > 1. Native K8s deployment was firstly introduced in Flink > > 1.10 > > > > > > > > > > 2. Native K8s HA was introduced in Flink 1.12 > > > > > > > > > > 3. The Pod template support was introduced in Flink 1.13 > > > > > > > > > > 4. There was some changes to the Flink docker image > > > entrypoint > > > > > > script > > > > > > > > in, > > > > > > > > > > IIRC, Flink 1.13 > > > > > > > > > > > > > > > > > > > > > > > > > > > > Great, thanks for providing this. It is important for the > > > > > > compatibility > > > > > > > > > going forward also. We are targeting Flink 1.14.x upwards. > > > Before > > > > > the > > > > > > > > > operator is ready there will be another Flink release. > Let's > > > see if > > > > > > > > anyone > > > > > > > > > is interested in earlier versions? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ## Compatibility > > > > > > > > > > What kind of API compatibility we can commit to? It's > > > probably > > > > > fine > > > > > > > to > > > > > > > > > have > > > > > > > > > > alpha / beta version APIs that allow incompatible future > > > changes > > > > > > for > > > > > > > > the > > > > > > > > > > first version. But eventually we would need to guarantee > > > > > backwards > > > > > > > > > > compatibility, so that an early version CR can work with > a > > > new > > > > > > > version > > > > > > > > > > operator. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Another great point and please let me include that on the > > FLIP > > > > > page. > > > > > > > ;-) > > > > > > > > > > > > > > > > > > I think we should allow incompatible changes for the first > > one > > > or > > > > > two > > > > > > > > > versions, similar to how other major features have evolved > > > > > recently, > > > > > > > such > > > > > > > > > as FLIP-27. > > > > > > > > > > > > > > > > > > Would be great to get broader feedback on this one. > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > Thomas > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thank you~ > > > > > > > > > > > > > > > > > > > > Xintong Song > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Jan 28, 2022 at 1:18 PM Thomas Weise < > > t...@apache.org > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Thanks for the feedback! > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > # 1 Flink Native vs Standalone integration > > > > > > > > > > > > Maybe we should make this more clear in the FLIP but > we > > > > > agreed > > > > > > to > > > > > > > > do > > > > > > > > > > the > > > > > > > > > > > > first version of the operator based on the native > > > > > integration. > > > > > > > > > > > > While this clearly does not cover all use-cases and > > > > > > requirements, > > > > > > > > it > > > > > > > > > > > seems > > > > > > > > > > > > this would lead to a much smaller initial effort and > a > > > nicer > > > > > > > first > > > > > > > > > > > version. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I'm also leaning towards the native integration, as > long > > > as it > > > > > > > > reduces > > > > > > > > > > the > > > > > > > > > > > MVP effort. Ultimately the operator will need to also > > > support > > > > > the > > > > > > > > > > > standalone mode. I would like to gain more confidence > > that > > > > > native > > > > > > > > > > > integration reduces the effort. While it cuts the > effort > > to > > > > > > handle > > > > > > > > the > > > > > > > > > TM > > > > > > > > > > > pod creation, some mapping code from the CR to the > native > > > > > > > integration > > > > > > > > > > > client and config needs to be created. As mentioned in > > the > > > > > FLIP, > > > > > > > > native > > > > > > > > > > > integration requires the Flink job manager to have > access > > > to > > > > > the > > > > > > > k8s > > > > > > > > > API > > > > > > > > > > to > > > > > > > > > > > create pods, which in some scenarios may be seen as > > > > > unfavorable. > > > > > > > > > > > > > > > > > > > > > > > > > # Pod Template > > > > > > > > > > > > > > Is the pod template in CR same with what Flink > has > > > > > already > > > > > > > > > > > > supported[4]? > > > > > > > > > > > > > > Then I am afraid not the arbitrary field(e.g. > > > cpu/memory > > > > > > > > > resources) > > > > > > > > > > > > could > > > > > > > > > > > > > > take effect. > > > > > > > > > > > > > > > > > > > > > > Yes, pod template would look almost identical. There > are > > a > > > few > > > > > > > > settings > > > > > > > > > > > that the operator will control (and that may need to be > > > > > > > blacklisted), > > > > > > > > > but > > > > > > > > > > > in general we would not want to place restrictions. I > > > think a > > > > > > > > mechanism > > > > > > > > > > > where a pod template is merged from multiple layers > would > > > also > > > > > be > > > > > > > > > > > interesting to make this more flexible. > > > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > Thomas > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >