Sorry for the late reply. We were out due to the public holidays in China. @Thomas,
The intention is to support application management through operator and CR, > which means there won't be any 2 step submission process, which as you > allude to would defeat the purpose of this project. The CR example shows > the application part. Please note that the bare cluster support is an > *additional* feature for scenarios that require external job management. Is > there anything on the FLIP page that creates a different impression? > Sounds good to me. I don't remember what created the impression of 2 step submission back then. I revisited the latest version of this FLIP and it looks good to me. @Gyula, Versioning: > Versioning will be independent from Flink and the operator will depend on a > fixed flink version (in every given operator version). > This should be the exact same setup as with Stateful Functions ( > https://github.com/apache/flink-statefun). So independent release cycle > but > still within the Flink umbrella. > Does this mean if someone wants to upgrade Flink to a version that is released after the operator version that is being used, he/she would need to upgrade the operator version first? I'm not questioning this, just trying to make sure I'm understanding this correctly. Thank you~ Xintong Song On Mon, Feb 7, 2022 at 3:14 AM Gyula Fóra <gyula.f...@gmail.com> wrote: > Thank you Alexis, > > Will definitely check this out. You are right, Kotlin makes it difficult to > adopt pieces of this code directly but I think it will be good to get > inspiration for the architecture and look at how particular problems have > been solved. It will be a great help for us I am sure. > > Cheers, > Gyula > > On Sat, Feb 5, 2022 at 12:28 PM Alexis Sarda-Espinosa < > alexis.sarda-espin...@microfocus.com> wrote: > > > Hi everyone, > > > > just wanted to mention that my employer agreed to open source the PoC I > > developed: https://github.com/MicroFocus/opsb-flink-k8s-operator > > > > I understand the concern for maintainability, so Gradle & Kotlin might > not > > be appealing to you, but at least it gives you another reference. The > Helm > > resources in particular might be useful. > > > > There are bits and pieces there referring to Flink sessions, but those > are > > just placeholders, the functioning parts use application mode with native > > integration. > > > > Regards, > > Alexis. > > > > ________________________________ > > From: Thomas Weise <t...@apache.org> > > Sent: Saturday, February 5, 2022 2:41 AM > > To: dev <dev@flink.apache.org> > > Subject: Re: [DISCUSS] FLIP-212: Introduce Flink Kubernetes Operator > > > > Hi, > > > > Thanks for the continued feedback and discussion. Looks like we are > > ready to start a VOTE, I will initiate it shortly. > > > > In parallel it would be good to find the repository name. > > > > My suggestion would be: flink-kubernetes-operator > > > > I thought "flink-operator" could be a bit misleading since the term > > operator already has a meaning in Flink. > > > > I also considered "flink-k8s-operator" but that would be almost > > identical to existing operator implementations and could lead to > > confusion in the future. > > > > Thoughts? > > > > Thanks, > > Thomas > > > > > > > > On Fri, Feb 4, 2022 at 5:15 AM Gyula Fóra <gyula.f...@gmail.com> wrote: > > > > > > Hi Danny, > > > > > > So far we have been focusing our dev efforts on the initial native > > > implementation with the team. > > > If the discussion and vote goes well for this FLIP we are looking > forward > > > to contributing the initial version sometime next week (fingers > crossed). > > > > > > At that point I think we can already start the dev work to support the > > > standalone mode as well, especially if you can dedicate some effort to > > > pushing that side. > > > Working together on this sounds like a great idea and we should start > as > > > soon as possible! :) > > > > > > Cheers, > > > Gyula > > > > > > On Fri, Feb 4, 2022 at 2:07 PM Danny Cranmer <dannycran...@apache.org> > > > wrote: > > > > > > > I have been discussing this one with my team. We are interested in > the > > > > Standalone mode, and are willing to contribute towards the > > implementation. > > > > Potentially we can work together to support both modes in parallel? > > > > > > > > Thanks, > > > > > > > > On Wed, Feb 2, 2022 at 4:02 PM Gyula Fóra <gyula.f...@gmail.com> > > wrote: > > > > > > > > > Hi Danny! > > > > > > > > > > Thanks for the feedback :) > > > > > > > > > > Versioning: > > > > > Versioning will be independent from Flink and the operator will > > depend > > > > on a > > > > > fixed flink version (in every given operator version). > > > > > This should be the exact same setup as with Stateful Functions ( > > > > > https://github.com/apache/flink-statefun). So independent release > > cycle > > > > > but > > > > > still within the Flink umbrella. > > > > > > > > > > Deployment error handling: > > > > > I think that's a very good point, as general exception handling for > > the > > > > > different failure scenarios is a tricky problem. I think the > > exception > > > > > classifiers and retry strategies could avoid a lot of manual > > intervention > > > > > from the user. We will definitely need to add something like this. > > Once > > > > we > > > > > have the repo created with the initial operator code we should open > > some > > > > > tickets for this and put it on the short term roadmap! > > > > > > > > > > Cheers, > > > > > Gyula > > > > > > > > > > On Wed, Feb 2, 2022 at 4:50 PM Danny Cranmer < > > dannycran...@apache.org> > > > > > wrote: > > > > > > > > > > > Hey team, > > > > > > > > > > > > Great work on the FLIP, I am looking forward to this one. I agree > > that > > > > we > > > > > > can move forward to the voting stage. > > > > > > > > > > > > I have general feedback around how we will handle job submission > > > > failure > > > > > > and retry. As discussed in the Rejected Alternatives section, we > > can > > > > use > > > > > > Java to handle job submission failures from the Flink client. It > > would > > > > be > > > > > > useful to have the ability to configure exception classifiers and > > retry > > > > > > strategy as part of operator configuration. > > > > > > > > > > > > Given this will be in a separate Github repository I am curious > how > > > > ther > > > > > > versioning strategy will work in relation to the Flink version? > Do > > we > > > > > have > > > > > > any other components with a similar setup I can look at? Will the > > > > > operator > > > > > > version track Flink or will it use its own versioning strategy > > with a > > > > > Flink > > > > > > version support matrix, or similar? > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Feb 1, 2022 at 2:33 PM Márton Balassi < > > > > balassi.mar...@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > Hi team, > > > > > > > > > > > > > > Thank you for the great feedback, Thomas has updated the FLIP > > page > > > > > > > accordingly. If you are comfortable with the currently existing > > > > design > > > > > > and > > > > > > > depth in the FLIP [1] I suggest moving forward to the voting > > stage - > > > > > once > > > > > > > that reaches a positive conclusion it lets us create the > separate > > > > code > > > > > > > repository under the flink project for the operator. > > > > > > > > > > > > > > I encourage everyone to keep improving the details in the > > meantime, > > > > > > however > > > > > > > I believe given the existing design and the general sentiment > on > > this > > > > > > > thread that the most efficient path from here is starting the > > > > > > > implementation so that we can collectively iterate over it. > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-212%3A+Introduce+Flink+Kubernetes+Operator > > > > > > > > > > > > > > On Mon, Jan 31, 2022 at 10:15 PM Thomas Weise <t...@apache.org> > > > > wrote: > > > > > > > > > > > > > > > HI Xintong, > > > > > > > > > > > > > > > > Thanks for the feedback and please see responses below --> > > > > > > > > > > > > > > > > On Fri, Jan 28, 2022 at 12:21 AM Xintong Song < > > > > tonysong...@gmail.com > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Thanks Thomas for drafting this FLIP, and everyone for the > > > > > > discussion. > > > > > > > > > > > > > > > > > > I also have a few questions and comments. > > > > > > > > > > > > > > > > > > ## Job Submission > > > > > > > > > Deploying a Flink session cluster via kubectl & CR and then > > > > > > submitting > > > > > > > > jobs > > > > > > > > > to the cluster via Flink cli / REST is probably the > approach > > that > > > > > > > > requires > > > > > > > > > the least effort. However, I'd like to point out 2 > > weaknesses. > > > > > > > > > 1. A lot of users use Flink in perjob/application modes. > For > > > > these > > > > > > > users, > > > > > > > > > having to run the job in two steps (deploy the cluster, and > > > > submit > > > > > > the > > > > > > > > job) > > > > > > > > > is not that convenient. > > > > > > > > > 2. One of our motivations is being able to manage Flink > > > > > applications' > > > > > > > > > lifecycles with kubectl. Submitting jobs from cli sounds > not > > > > > aligned > > > > > > > with > > > > > > > > > this motivation. > > > > > > > > > I think it's probably worth it to support submitting jobs > via > > > > > > kubectl & > > > > > > > > CR > > > > > > > > > in the first version, both together with deploying the > > cluster > > > > like > > > > > > in > > > > > > > > > perjob/application mode and after deploying the cluster > like > > in > > > > > > session > > > > > > > > > mode. > > > > > > > > > > > > > > > > > > > > > > > > > The intention is to support application management through > > operator > > > > > and > > > > > > > CR, > > > > > > > > which means there won't be any 2 step submission process, > > which as > > > > > you > > > > > > > > allude to would defeat the purpose of this project. The CR > > example > > > > > > shows > > > > > > > > the application part. Please note that the bare cluster > > support is > > > > an > > > > > > > > *additional* feature for scenarios that require external job > > > > > > management. > > > > > > > Is > > > > > > > > there anything on the FLIP page that creates a different > > > > impression? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ## Versioning > > > > > > > > > Which Flink versions does the operator plan to support? > > > > > > > > > 1. Native K8s deployment was firstly introduced in Flink > 1.10 > > > > > > > > > 2. Native K8s HA was introduced in Flink 1.12 > > > > > > > > > 3. The Pod template support was introduced in Flink 1.13 > > > > > > > > > 4. There was some changes to the Flink docker image > > entrypoint > > > > > script > > > > > > > in, > > > > > > > > > IIRC, Flink 1.13 > > > > > > > > > > > > > > > > > > > > > > > > > Great, thanks for providing this. It is important for the > > > > > compatibility > > > > > > > > going forward also. We are targeting Flink 1.14.x upwards. > > Before > > > > the > > > > > > > > operator is ready there will be another Flink release. Let's > > see if > > > > > > > anyone > > > > > > > > is interested in earlier versions? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ## Compatibility > > > > > > > > > What kind of API compatibility we can commit to? It's > > probably > > > > fine > > > > > > to > > > > > > > > have > > > > > > > > > alpha / beta version APIs that allow incompatible future > > changes > > > > > for > > > > > > > the > > > > > > > > > first version. But eventually we would need to guarantee > > > > backwards > > > > > > > > > compatibility, so that an early version CR can work with a > > new > > > > > > version > > > > > > > > > operator. > > > > > > > > > > > > > > > > > > > > > > > > > Another great point and please let me include that on the > FLIP > > > > page. > > > > > > ;-) > > > > > > > > > > > > > > > > I think we should allow incompatible changes for the first > one > > or > > > > two > > > > > > > > versions, similar to how other major features have evolved > > > > recently, > > > > > > such > > > > > > > > as FLIP-27. > > > > > > > > > > > > > > > > Would be great to get broader feedback on this one. > > > > > > > > > > > > > > > > Cheers, > > > > > > > > Thomas > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thank you~ > > > > > > > > > > > > > > > > > > Xintong Song > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Jan 28, 2022 at 1:18 PM Thomas Weise < > t...@apache.org > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Thanks for the feedback! > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > # 1 Flink Native vs Standalone integration > > > > > > > > > > > Maybe we should make this more clear in the FLIP but we > > > > agreed > > > > > to > > > > > > > do > > > > > > > > > the > > > > > > > > > > > first version of the operator based on the native > > > > integration. > > > > > > > > > > > While this clearly does not cover all use-cases and > > > > > requirements, > > > > > > > it > > > > > > > > > > seems > > > > > > > > > > > this would lead to a much smaller initial effort and a > > nicer > > > > > > first > > > > > > > > > > version. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I'm also leaning towards the native integration, as long > > as it > > > > > > > reduces > > > > > > > > > the > > > > > > > > > > MVP effort. Ultimately the operator will need to also > > support > > > > the > > > > > > > > > > standalone mode. I would like to gain more confidence > that > > > > native > > > > > > > > > > integration reduces the effort. While it cuts the effort > to > > > > > handle > > > > > > > the > > > > > > > > TM > > > > > > > > > > pod creation, some mapping code from the CR to the native > > > > > > integration > > > > > > > > > > client and config needs to be created. As mentioned in > the > > > > FLIP, > > > > > > > native > > > > > > > > > > integration requires the Flink job manager to have access > > to > > > > the > > > > > > k8s > > > > > > > > API > > > > > > > > > to > > > > > > > > > > create pods, which in some scenarios may be seen as > > > > unfavorable. > > > > > > > > > > > > > > > > > > > > > > > # Pod Template > > > > > > > > > > > > > Is the pod template in CR same with what Flink has > > > > already > > > > > > > > > > > supported[4]? > > > > > > > > > > > > > Then I am afraid not the arbitrary field(e.g. > > cpu/memory > > > > > > > > resources) > > > > > > > > > > > could > > > > > > > > > > > > > take effect. > > > > > > > > > > > > > > > > > > > > Yes, pod template would look almost identical. There are > a > > few > > > > > > > settings > > > > > > > > > > that the operator will control (and that may need to be > > > > > > blacklisted), > > > > > > > > but > > > > > > > > > > in general we would not want to place restrictions. I > > think a > > > > > > > mechanism > > > > > > > > > > where a pod template is merged from multiple layers would > > also > > > > be > > > > > > > > > > interesting to make this more flexible. > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > Thomas > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >