I agree with flink-kubernetes-operator as the repo name :) Don't have any better idea
Gyula On Sat, Feb 5, 2022 at 2:41 AM Thomas Weise <t...@apache.org> wrote: > Hi, > > Thanks for the continued feedback and discussion. Looks like we are > ready to start a VOTE, I will initiate it shortly. > > In parallel it would be good to find the repository name. > > My suggestion would be: flink-kubernetes-operator > > I thought "flink-operator" could be a bit misleading since the term > operator already has a meaning in Flink. > > I also considered "flink-k8s-operator" but that would be almost > identical to existing operator implementations and could lead to > confusion in the future. > > Thoughts? > > Thanks, > Thomas > > > > On Fri, Feb 4, 2022 at 5:15 AM Gyula Fóra <gyula.f...@gmail.com> wrote: > > > > Hi Danny, > > > > So far we have been focusing our dev efforts on the initial native > > implementation with the team. > > If the discussion and vote goes well for this FLIP we are looking forward > > to contributing the initial version sometime next week (fingers crossed). > > > > At that point I think we can already start the dev work to support the > > standalone mode as well, especially if you can dedicate some effort to > > pushing that side. > > Working together on this sounds like a great idea and we should start as > > soon as possible! :) > > > > Cheers, > > Gyula > > > > On Fri, Feb 4, 2022 at 2:07 PM Danny Cranmer <dannycran...@apache.org> > > wrote: > > > > > I have been discussing this one with my team. We are interested in the > > > Standalone mode, and are willing to contribute towards the > implementation. > > > Potentially we can work together to support both modes in parallel? > > > > > > Thanks, > > > > > > On Wed, Feb 2, 2022 at 4:02 PM Gyula Fóra <gyula.f...@gmail.com> > wrote: > > > > > > > Hi Danny! > > > > > > > > Thanks for the feedback :) > > > > > > > > Versioning: > > > > Versioning will be independent from Flink and the operator will > depend > > > on a > > > > fixed flink version (in every given operator version). > > > > This should be the exact same setup as with Stateful Functions ( > > > > https://github.com/apache/flink-statefun). So independent release > cycle > > > > but > > > > still within the Flink umbrella. > > > > > > > > Deployment error handling: > > > > I think that's a very good point, as general exception handling for > the > > > > different failure scenarios is a tricky problem. I think the > exception > > > > classifiers and retry strategies could avoid a lot of manual > intervention > > > > from the user. We will definitely need to add something like this. > Once > > > we > > > > have the repo created with the initial operator code we should open > some > > > > tickets for this and put it on the short term roadmap! > > > > > > > > Cheers, > > > > Gyula > > > > > > > > On Wed, Feb 2, 2022 at 4:50 PM Danny Cranmer < > dannycran...@apache.org> > > > > wrote: > > > > > > > > > Hey team, > > > > > > > > > > Great work on the FLIP, I am looking forward to this one. I agree > that > > > we > > > > > can move forward to the voting stage. > > > > > > > > > > I have general feedback around how we will handle job submission > > > failure > > > > > and retry. As discussed in the Rejected Alternatives section, we > can > > > use > > > > > Java to handle job submission failures from the Flink client. It > would > > > be > > > > > useful to have the ability to configure exception classifiers and > retry > > > > > strategy as part of operator configuration. > > > > > > > > > > Given this will be in a separate Github repository I am curious how > > > ther > > > > > versioning strategy will work in relation to the Flink version? Do > we > > > > have > > > > > any other components with a similar setup I can look at? Will the > > > > operator > > > > > version track Flink or will it use its own versioning strategy > with a > > > > Flink > > > > > version support matrix, or similar? > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > On Tue, Feb 1, 2022 at 2:33 PM Márton Balassi < > > > balassi.mar...@gmail.com> > > > > > wrote: > > > > > > > > > > > Hi team, > > > > > > > > > > > > Thank you for the great feedback, Thomas has updated the FLIP > page > > > > > > accordingly. If you are comfortable with the currently existing > > > design > > > > > and > > > > > > depth in the FLIP [1] I suggest moving forward to the voting > stage - > > > > once > > > > > > that reaches a positive conclusion it lets us create the separate > > > code > > > > > > repository under the flink project for the operator. > > > > > > > > > > > > I encourage everyone to keep improving the details in the > meantime, > > > > > however > > > > > > I believe given the existing design and the general sentiment on > this > > > > > > thread that the most efficient path from here is starting the > > > > > > implementation so that we can collectively iterate over it. > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-212%3A+Introduce+Flink+Kubernetes+Operator > > > > > > > > > > > > On Mon, Jan 31, 2022 at 10:15 PM Thomas Weise <t...@apache.org> > > > wrote: > > > > > > > > > > > > > HI Xintong, > > > > > > > > > > > > > > Thanks for the feedback and please see responses below --> > > > > > > > > > > > > > > On Fri, Jan 28, 2022 at 12:21 AM Xintong Song < > > > tonysong...@gmail.com > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > Thanks Thomas for drafting this FLIP, and everyone for the > > > > > discussion. > > > > > > > > > > > > > > > > I also have a few questions and comments. > > > > > > > > > > > > > > > > ## Job Submission > > > > > > > > Deploying a Flink session cluster via kubectl & CR and then > > > > > submitting > > > > > > > jobs > > > > > > > > to the cluster via Flink cli / REST is probably the approach > that > > > > > > > requires > > > > > > > > the least effort. However, I'd like to point out 2 > weaknesses. > > > > > > > > 1. A lot of users use Flink in perjob/application modes. For > > > these > > > > > > users, > > > > > > > > having to run the job in two steps (deploy the cluster, and > > > submit > > > > > the > > > > > > > job) > > > > > > > > is not that convenient. > > > > > > > > 2. One of our motivations is being able to manage Flink > > > > applications' > > > > > > > > lifecycles with kubectl. Submitting jobs from cli sounds not > > > > aligned > > > > > > with > > > > > > > > this motivation. > > > > > > > > I think it's probably worth it to support submitting jobs via > > > > > kubectl & > > > > > > > CR > > > > > > > > in the first version, both together with deploying the > cluster > > > like > > > > > in > > > > > > > > perjob/application mode and after deploying the cluster like > in > > > > > session > > > > > > > > mode. > > > > > > > > > > > > > > > > > > > > > > The intention is to support application management through > operator > > > > and > > > > > > CR, > > > > > > > which means there won't be any 2 step submission process, > which as > > > > you > > > > > > > allude to would defeat the purpose of this project. The CR > example > > > > > shows > > > > > > > the application part. Please note that the bare cluster > support is > > > an > > > > > > > *additional* feature for scenarios that require external job > > > > > management. > > > > > > Is > > > > > > > there anything on the FLIP page that creates a different > > > impression? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ## Versioning > > > > > > > > Which Flink versions does the operator plan to support? > > > > > > > > 1. Native K8s deployment was firstly introduced in Flink 1.10 > > > > > > > > 2. Native K8s HA was introduced in Flink 1.12 > > > > > > > > 3. The Pod template support was introduced in Flink 1.13 > > > > > > > > 4. There was some changes to the Flink docker image > entrypoint > > > > script > > > > > > in, > > > > > > > > IIRC, Flink 1.13 > > > > > > > > > > > > > > > > > > > > > > Great, thanks for providing this. It is important for the > > > > compatibility > > > > > > > going forward also. We are targeting Flink 1.14.x upwards. > Before > > > the > > > > > > > operator is ready there will be another Flink release. Let's > see if > > > > > > anyone > > > > > > > is interested in earlier versions? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ## Compatibility > > > > > > > > What kind of API compatibility we can commit to? It's > probably > > > fine > > > > > to > > > > > > > have > > > > > > > > alpha / beta version APIs that allow incompatible future > changes > > > > for > > > > > > the > > > > > > > > first version. But eventually we would need to guarantee > > > backwards > > > > > > > > compatibility, so that an early version CR can work with a > new > > > > > version > > > > > > > > operator. > > > > > > > > > > > > > > > > > > > > > > Another great point and please let me include that on the FLIP > > > page. > > > > > ;-) > > > > > > > > > > > > > > I think we should allow incompatible changes for the first one > or > > > two > > > > > > > versions, similar to how other major features have evolved > > > recently, > > > > > such > > > > > > > as FLIP-27. > > > > > > > > > > > > > > Would be great to get broader feedback on this one. > > > > > > > > > > > > > > Cheers, > > > > > > > Thomas > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thank you~ > > > > > > > > > > > > > > > > Xintong Song > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Jan 28, 2022 at 1:18 PM Thomas Weise <t...@apache.org > > > > > > wrote: > > > > > > > > > > > > > > > > > Thanks for the feedback! > > > > > > > > > > > > > > > > > > > > > > > > > > > > > # 1 Flink Native vs Standalone integration > > > > > > > > > > Maybe we should make this more clear in the FLIP but we > > > agreed > > > > to > > > > > > do > > > > > > > > the > > > > > > > > > > first version of the operator based on the native > > > integration. > > > > > > > > > > While this clearly does not cover all use-cases and > > > > requirements, > > > > > > it > > > > > > > > > seems > > > > > > > > > > this would lead to a much smaller initial effort and a > nicer > > > > > first > > > > > > > > > version. > > > > > > > > > > > > > > > > > > > > > > > > > > > > I'm also leaning towards the native integration, as long > as it > > > > > > reduces > > > > > > > > the > > > > > > > > > MVP effort. Ultimately the operator will need to also > support > > > the > > > > > > > > > standalone mode. I would like to gain more confidence that > > > native > > > > > > > > > integration reduces the effort. While it cuts the effort to > > > > handle > > > > > > the > > > > > > > TM > > > > > > > > > pod creation, some mapping code from the CR to the native > > > > > integration > > > > > > > > > client and config needs to be created. As mentioned in the > > > FLIP, > > > > > > native > > > > > > > > > integration requires the Flink job manager to have access > to > > > the > > > > > k8s > > > > > > > API > > > > > > > > to > > > > > > > > > create pods, which in some scenarios may be seen as > > > unfavorable. > > > > > > > > > > > > > > > > > > > > > # Pod Template > > > > > > > > > > > > Is the pod template in CR same with what Flink has > > > already > > > > > > > > > > supported[4]? > > > > > > > > > > > > Then I am afraid not the arbitrary field(e.g. > cpu/memory > > > > > > > resources) > > > > > > > > > > could > > > > > > > > > > > > take effect. > > > > > > > > > > > > > > > > > > Yes, pod template would look almost identical. There are a > few > > > > > > settings > > > > > > > > > that the operator will control (and that may need to be > > > > > blacklisted), > > > > > > > but > > > > > > > > > in general we would not want to place restrictions. I > think a > > > > > > mechanism > > > > > > > > > where a pod template is merged from multiple layers would > also > > > be > > > > > > > > > interesting to make this more flexible. > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > Thomas > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >