Thank Canbin for starting the discussion and thanks for participating so far.
I agree that we can start with the "Parameters Parser" part and actually split it into a self-contained issue. Best, tison. Yang Wang <danrtsey...@gmail.com> 于2020年2月24日周一 下午8:38写道: > It seems that we could benefit a lot from the "Parameters Parser". Let's > start to > add the dedicated jobmanager/taskmanager config parser classes. > > > Best, > Yang > > Canbin Zheng <felixzhen...@gmail.com> 于2020年2月24日周一 上午10:39写道: > >> Hi, Yang Wang, >> >> Thanks for the feedback. >> >> > Parameters Parser >> >> I can think of many benefits of the dedicated parameter parsing/verifying >> tools, here is some of them: >> 1. Reuse the parameters parsing and verifying code. >> 2. Ensure consistent handling logic for the same setting. >> 3. Simplify some of the method's signature because we can avoid defining >> too many unnecessary input parameters. >> >> > BTW, the fabric8 kubernetes-client and Kubernetes api server already has >> the parameters check before starting to create the resources. I think the >> exceptions >> are usually meaning and enough for the users to get the root cause. >> >> 1. On the one hand, there are gaps between the declarative model used by >> Kubernetes and the configuration model used by Flink. Indeed, the >> Kubernetes client/server helps check the Kubernetes side parameters and >> throw exceptions in case of failures; however, the exception messages are >> not always easily understood from the perspective of Flink users. >> Independent of the solutions, what we should make sure is that the >> exception information in the Flink configuration model is meaningful enough >> for the Flink users. >> 2. On the other hand, some prechecking is beneficial in some scenarios >> since we can avoid useless effort on Kubernetes resource creation in case >> of failures leading to clean up all the created resources. >> >> >> Regards, >> Canbin Zheng >> >> Yang Wang <danrtsey...@gmail.com> 于2020年2月23日周日 下午10:37写道: >> >>> > The new introduced decorator >>> After some offline discussion with Canbin and tison, i totally understand >>> the evolved decorator design. Each decorator will be self-contained and >>> is responsible for just one thing. Currently, if we want to mount a new >>> config >>> file to jobmanager/taskmanager, then both `ConfigMapDecorator`, >>> `JobManagerDecorator`, `TaskManagerDecorator` needs to be updated. >>> It is not convenient for the new contributors to do this. In the new >>> design, >>> by leveraging the accompanying kubernetes resources in decorator, we >>> could finish the creating and mounting config map in one decorator. >>> >>> Since now we just have a basic implementation for native Kubernetes >>> integration and lots of features need to be developed. And many users >>> want to participate in and contribute to the integration. So i agree to >>> refactor >>> the current decorator implementation and make it easier for the new >>> contributor. >>> >>> For the detailed divergences(naming, etc.), i think we could discuss in >>> the PR. >>> >>> > Parameters Parser >>> Currently, the decorator directly use Flink configuration to get the >>> parameters >>> to build the Kubernetes resource. It is straightforward, however we >>> could not >>> have a unified parameters check. So i am not sure whether you will >>> introduce >>> a tool to check the parameters or just simply have our own basic check. >>> >>> BTW, the fabric8 kubernetes-client and Kubernetes api server already has >>> the parameters check before starting to create the resources. I think >>> the exceptions >>> are usually meaning and enough for the users to get the root cause. >>> >>> >>> >>> Best, >>> Yang >>> >>> >>> >>> >>> felixzheng zheng <felixzhen...@gmail.com> 于2020年2月22日周六 上午10:54写道: >>> >>>> Great thanks for the quick feedback Till. You are right; it is not a >>>> fundamentally different approach compared to >>>> what we have right now, all the Kubernetes resources created are the >>>> same, >>>> we aim to evolve the existing decorator approach so that, >>>> 1. the decorators are monadic and smaller in size and functionality. >>>> 2. the new decorator design allows reusing the decorators between the >>>> client and the cluster as much as possible. >>>> 3. all the decorators are independent with each other, and they could >>>> have >>>> arbitrary order in the chain, they share the same APIs and follow a >>>> unified >>>> orchestrator architecture so that new developers could quickly >>>> understand >>>> what should be done to introduce a new feature. >>>> >>>> Besides that, the new approach allows us adding tests for every >>>> decorator >>>> alone instead of doing a final test of all the decorators in the >>>> Fabric8ClientTest.java. >>>> >>>> Cheers, >>>> Canbin Zheng >>>> >>>> Till Rohrmann <trohrm...@apache.org> 于2020年2月22日周六 上午12:28写道: >>>> >>>> > Thanks for starting this discussion Canbin. If I understand your >>>> proposal >>>> > correctly, then you would like to evolve the existing decorator >>>> approach so >>>> > that decorators are monadic and smaller in size and functionality. The >>>> > latter aspect will allow to reuse them between the client and the >>>> cluster. >>>> > Just to make sure, it is not a fundamentally different approach >>>> compared to >>>> > what we have right now, is it? >>>> > >>>> > If this is the case, then I think it makes sense to reuse code as >>>> much as >>>> > possible and to create small code units which are easier to test. >>>> > >>>> > Cheers, >>>> > Till >>>> > >>>> > On Fri, Feb 21, 2020 at 4:41 PM felixzheng zheng < >>>> felixzhen...@gmail.com> >>>> > wrote: >>>> > >>>> > > Thanks for the feedback @Yang Wang. I would like to discuss some of >>>> the >>>> > > details in depth about why I am confused about the existing design. >>>> > > >>>> > > Question 1: How do we mount a configuration file? >>>> > > >>>> > > For the existing design, >>>> > > >>>> > > 1. >>>> > > >>>> > > We need several classes to finish it: >>>> > > 1. >>>> > > >>>> > > InitializerDecorator >>>> > > 2. >>>> > > >>>> > > OwnerReferenceDecorator >>>> > > 3. >>>> > > >>>> > > ConfigMapDecorator >>>> > > 4. >>>> > > >>>> > > KubernetesUtils: providing the getConfigMapVolume method to >>>> share >>>> > for >>>> > > the FlinkMasterDeploymentDecorator and the >>>> TaskManagerPodDecorator. >>>> > > 5. >>>> > > >>>> > > FlinkMasterDeploymentDecorator: mounts the ConfigMap volume. >>>> > > 6. >>>> > > >>>> > > TaskManagerPodDecorator: mounts the ConfigMap volume. >>>> > > 7. >>>> > > >>>> > > If in the future, someone would like to introduce an init >>>> > Container, >>>> > > the InitContainerDecorator has to mount the ConfigMap volume >>>> too. >>>> > > >>>> > > >>>> > > I am confused about the current solution to mounting a configuration >>>> > file: >>>> > > >>>> > > 1. >>>> > > >>>> > > Actually, we do not need so many Decorators for mounting a file. >>>> > > 2. >>>> > > >>>> > > If we would like to mount a new file, we have no choice but to >>>> repeat >>>> > > the same tedious and scattered routine. >>>> > > 3. >>>> > > >>>> > > There’s no easy way to test the file mounting functionality >>>> alone; we >>>> > > have to construct the ConfigMap, the Deployment or the >>>> TaskManagerPod >>>> > > first >>>> > > and then do a final test. >>>> > > >>>> > > >>>> > > The reason why it is so complex to mount a configuration file is >>>> that we >>>> > > don’t fully consider the internal connections among those resources >>>> in >>>> > the >>>> > > existing design. >>>> > > >>>> > > The new abstraction we proposed could solve such a kind of problem, >>>> the >>>> > new >>>> > > Decorator object is as follows: >>>> > > >>>> > > public interface KubernetesStepDecorator { >>>> > > >>>> > > /** >>>> > > >>>> > > * Apply transformations to the given FlinkPod in accordance with >>>> this >>>> > > feature. This can include adding >>>> > > >>>> > > * labels/annotations, mounting volumes, and setting startup >>>> command or >>>> > > parameters, etc. >>>> > > >>>> > > */ >>>> > > >>>> > > FlinkPod decorateFlinkPod(FlinkPod flinkPod); >>>> > > >>>> > > /** >>>> > > >>>> > > * Build the accompanying Kubernetes resources that should be >>>> > introduced >>>> > > to support this feature. This could >>>> > > >>>> > > * only applicable to the client-side submission process. >>>> > > >>>> > > */ >>>> > > >>>> > > List<HasMetadata> buildAccompanyingKubernetesResources() throws >>>> > > IOException; >>>> > > >>>> > > } >>>> > > >>>> > > The FlinkPod is a composition of the Pod, the main Container, the >>>> init >>>> > > Container, and the sidecar Container. >>>> > > >>>> > > Next, we introduce a KubernetesStepDecorator implementation, the >>>> method >>>> > of >>>> > > buildAccompanyingKubernetesResources creates the corresponding >>>> ConfigMap, >>>> > > and the method of decorateFlinkPod configures the Volume for the >>>> Pod and >>>> > > all the Containers. >>>> > > >>>> > > So, for the scenario of mounting a configuration file, the >>>> advantages of >>>> > > this new architecture are as follows: >>>> > > >>>> > > 1. >>>> > > >>>> > > One dedicated KubernetesStepDecorator implementation is enough to >>>> > finish >>>> > > all the mounting work, meanwhile, this class would be shared >>>> between >>>> > the >>>> > > client-side and the master-side. Besides that, the number of >>>> lines of >>>> > > code >>>> > > in that class will not exceed 300 lines which facilitate code >>>> > > readability >>>> > > and maintenance. >>>> > > 2. >>>> > > >>>> > > Testing becomes an easy thing now, as we can add a dedicated test >>>> > class >>>> > > for only this class, the test would never rely on the >>>> construction of >>>> > > other >>>> > > components such as the Deployment. >>>> > > 3. >>>> > > >>>> > > It’s quite convenient to mount a new configuration file via the >>>> newly >>>> > > dedicated KubernetesStepDecorator implementation. >>>> > > >>>> > > >>>> > > Question 2: How do we construct the Pod? >>>> > > >>>> > > For the existing design, >>>> > > >>>> > > 1. >>>> > > >>>> > > The FlinkMasterDeploymentDecorator is responsible for building >>>> the >>>> > > Deployment and configuring the Pod and the Containers, while the >>>> > > TaskManagerPodDecorator is responsible for building the >>>> TaskManager >>>> > Pod. >>>> > > Take FlinkMasterDeploymentDecorator as an example, let’s see >>>> what it >>>> > has >>>> > > done. >>>> > > 1. >>>> > > >>>> > > Configure the main Container, including the name, command, >>>> args, >>>> > > image, image pull policy, resource requirements, ports, all >>>> kinds >>>> > of >>>> > > environment variables, all kinds of volume mounts, etc. >>>> > > 2. >>>> > > >>>> > > Configure the Pod, including service account, all kinds of >>>> volumes, >>>> > > and attach all kinds of Container, including the main >>>> Container, >>>> > the >>>> > > init >>>> > > Container, and the sidecar Container. >>>> > > 3. >>>> > > >>>> > > Configure the Deployment. >>>> > > >>>> > > >>>> > > >>>> > > 1. >>>> > > >>>> > > The InitializerDecorator and the OwnerReferenceDecorator have >>>> basic >>>> > > logic so that the most complex work is completed in the >>>> > > FlinkMasterDeploymentDecorator and the TaskManagerPodDecorator. >>>> With >>>> > the >>>> > > introduction of new features for the Pod, such as customized >>>> volume >>>> > > mounts, >>>> > > Hadoop configuration support, Kerberized HDFS support, secret >>>> mounts, >>>> > > Python support, etc. the construction process could become far >>>> more >>>> > > complicated, and the functionality of a single class could >>>> explode, >>>> > > which >>>> > > hurts code readability, writability, and testability. Besides >>>> that, >>>> > both >>>> > > the client-side and the master-side shares much of the Pod >>>> > construction >>>> > > logic. >>>> > > >>>> > > >>>> > > So the problems are as follows: >>>> > > >>>> > > 1. >>>> > > >>>> > > We don’t have a consistent abstraction that is applicable to >>>> both the >>>> > > client-side and the master-side for Pod construction (including >>>> the >>>> > > Containers) so that we have to share some of the code via tool >>>> classes >>>> > > which is a trick solution, however, we can’t share most of the >>>> code as >>>> > > much >>>> > > as possible. >>>> > > 2. >>>> > > >>>> > > We don’t have a step-by-step orchestrator architecture to help >>>> the Pod >>>> > > construction. >>>> > > >>>> > > >>>> > > For the proposed design, >>>> > > >>>> > > 1. >>>> > > >>>> > > The problems above are all solved: we have a consistent >>>> abstraction >>>> > that >>>> > > leads to a monadic-step orchestrator architecture to construct >>>> the Pod >>>> > > step >>>> > > by step. One step is responsible for exactly one thing, >>>> following that >>>> > > is >>>> > > the fact that every step is well testable, because each unit can >>>> be >>>> > > considerably small, and the number of branches to test in any >>>> given >>>> > > step is >>>> > > limited. Moreover, much of the step could be shared between the >>>> > > client-side >>>> > > and the master-side, such as configuration files mount, etc. >>>> > > >>>> > > >>>> > > Question 3: Could all the resources share the same orchestrator >>>> > > architecture? >>>> > > >>>> > > For the existing design, >>>> > > >>>> > > 1. >>>> > > >>>> > > We don’t have a unified orchestrator architecture for the >>>> construction >>>> > > of all the Kubernetes resources, therefore, we need a Decorator >>>> chain >>>> > > for >>>> > > every Kubernetes resource. >>>> > > >>>> > > >>>> > > For the proposed design, >>>> > > >>>> > > 1. >>>> > > >>>> > > Following Question 1 ~ 2 and the design doc[1], we have >>>> introduced a >>>> > > monadic-step orchestrator architecture to construct the Pod and >>>> all >>>> > the >>>> > > Containers. Besides that, this architecture also works for the >>>> other >>>> > > resources, such as the Services and the ConfigMaps. For example, >>>> if we >>>> > > need >>>> > > a new Service or a ConfigMap, just introduce a new step. >>>> > > >>>> > > >>>> > > >>>> > > Question 4: How do we introduce the InitContainer or the side-car >>>> > > Container? >>>> > > >>>> > > For the existing design, >>>> > > >>>> > > 1. >>>> > > >>>> > > Both the client-side and the master-side introduce some new >>>> > Decorators, >>>> > > >>>> > > the client-side chain could be: >>>> > > >>>> > > InitializerDecorator -> OwnerReferenceDecorator -> >>>> > > FlinkMasterDeploymentDecorator -> InitContainerDecorator -> >>>> > > SidecarDecorator -> etc >>>> > > >>>> > > - >>>> > > >>>> > > and the master-side could be: >>>> > > >>>> > > InitializerDecorator -> OwnerReferenceDecorator -> >>>> > TaskManagerPodDecorator >>>> > > -> InitContainerDecorator -> SidecarDecorator -> etc >>>> > > >>>> > > As we can see, the FlinkMasterDeploymentDecorator or the >>>> > > TaskManagerPodDecorator is designed for the Pod construction, >>>> including >>>> > the >>>> > > Containers, so we don’t need to treat the init Container and the >>>> sidecar >>>> > > Container as special cases different from the main Container by >>>> > introducing >>>> > > a dedicated Decorator for every one of them. Such kind of trick >>>> solution >>>> > > confuses me, maybe to the other developers. >>>> > > >>>> > > For the proposed design, >>>> > > >>>> > > 1. >>>> > > >>>> > > Following Question 1 ~ 4 and the design doc [1], the central >>>> premise >>>> > of >>>> > > the proposed orchestrator architecture is that the Pod, the main >>>> > > Container, >>>> > > the init Container, the sidecar Container, the Services, and the >>>> > > ConfigMaps >>>> > > all sit on an equal footing. For every one of the Pod, the main >>>> > > Container, >>>> > > the init Container and the sidecar Container, people could >>>> introduce >>>> > any >>>> > > number of steps to finish its construction; we attach all the >>>> > > Containers to >>>> > > the Pod in the Builder tool classes after the orchestrator >>>> > architecture >>>> > > constructs them. >>>> > > >>>> > > >>>> > > Question 5: What is the relation between the Decorators? >>>> > > >>>> > > For the existing design, >>>> > > >>>> > > 1. >>>> > > >>>> > > The Decorators are not independent; most of them have a strict >>>> order >>>> > in >>>> > > the chain. >>>> > > 2. >>>> > > >>>> > > The Decorators do not share common APIs in essence, as their >>>> input >>>> > type >>>> > > could be different so that we can’t finish the construction of a >>>> > > Kubernetes >>>> > > resource such as the Deployment and the TaskManager Pod through a >>>> > > one-time >>>> > > traversal, there are boxing and unboxing overheads among some of >>>> the >>>> > > neighboring Decorators. >>>> > > >>>> > > >>>> > > For the proposed design, >>>> > > >>>> > > 1. >>>> > > >>>> > > The steps are completely independent so that they could have >>>> arbitrary >>>> > > order in the chain; The only rule is that the Hadoop >>>> configuration >>>> > > Decorator should be the last node in the chain. >>>> > > 2. >>>> > > >>>> > > All the steps share common APIs, with the same input type of all >>>> the >>>> > > methods, so we can construct a Kubernetes resource via a one-time >>>> > > traversal. >>>> > > >>>> > > >>>> > > Question 6: What is the number of Decorators chains? >>>> > > >>>> > > For the existing design, >>>> > > >>>> > > 1. >>>> > > >>>> > > People have to introduce a new chain once they introduce a new >>>> > > Kubernetes resource. For example, a new ConfigMap Decorator >>>> chain for >>>> > > mounting a new configuration file. At this moment, we already >>>> have >>>> > five >>>> > > Decorator chains. >>>> > > >>>> > > >>>> > > For the proposed design, >>>> > > >>>> > > 1. >>>> > > >>>> > > There are always two chains, one is for constructing all the >>>> > Kubernetes >>>> > > resources on the client-side, including the JobManager Pod, the >>>> > > Containers, >>>> > > the ConfigMap(s), and the Services(s); the other one is for >>>> > constructing >>>> > > the TaskManager Pod and the Containers. >>>> > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> > > Yang Wang <danrtsey...@gmail.com> 于2020年2月21日周五 下午2:05写道: >>>> > > >>>> > > > Hi Canbing, >>>> > > > >>>> > > > >>>> > > > Thanks a lot for sharing your thoughts to improve the Flink on K8s >>>> > native >>>> > > > integration. >>>> > > > Frankly speaking, your discussion title confuses me and i am >>>> wondering >>>> > > > whether you >>>> > > > want to refactor the whole design. However, after i dive into the >>>> > details >>>> > > > and the provided >>>> > > > document, i realize that mostly you want to improve the >>>> implementation. >>>> > > > >>>> > > > >>>> > > > Regarding your two main points. >>>> > > > >>>> > > > >> Introduce a unified monadic-step based orchestrator >>>> architecture >>>> > that >>>> > > > has a better, >>>> > > > cleaner and consistent abstraction for the Kubernetes resources >>>> > > > construction process, >>>> > > > both applicable to the client side and the master side. >>>> > > > >>>> > > > When i introduce the decorator for the K8s in Flink, there is >>>> always a >>>> > > > guideline in my mind >>>> > > > that it should be easy for extension and adding new features. >>>> Just as >>>> > you >>>> > > > say, we have lots >>>> > > > of functions to support and the K8s is also evolving very fast. >>>> The >>>> > > current >>>> > > > `ConfigMapDecorator`, >>>> > > > `FlinkMasterDeploymentDecorator`, `TaskManagerPodDecorator` is a >>>> basic >>>> > > > implementation with >>>> > > > some prerequisite parameters. Of course we could chain more >>>> decorators >>>> > to >>>> > > > construct the K8s >>>> > > > resources. For example, InitializerDecorator -> >>>> OwnerReferenceDecorator >>>> > > -> >>>> > > > FlinkMasterDeploymentDecorator -> InitContainerDecorator -> >>>> > > > SidecarDecorator -> etc. >>>> > > > >>>> > > > I am little sceptical about splitting every parameter into a >>>> single >>>> > > > decorator. Since it does >>>> > > > not take too much benefits. But i agree with moving some common >>>> > > parameters >>>> > > > into separate >>>> > > > decorators(e.g. volume mount). Also introducing the `~Builder` >>>> class >>>> > and >>>> > > > spinning off the chaining >>>> > > > decorator calls from `Fabric8FlinkKubeClient` make sense to me. >>>> > > > >>>> > > > >>>> > > > >>>> > > > >> Add some dedicated tools for centrally parsing, verifying, and >>>> > > managing >>>> > > > the Kubernetes parameters. >>>> > > > >>>> > > > Currently, we always get the parameters directly from Flink >>>> > > configuration( >>>> > > > e.g. >>>> > `flinkConfig.getString(KubernetesConfigOptions.CONTAINER_IMAGE)`). I >>>> > > > think it could be improved >>>> > > > by introducing some dedicated conf parser classes. It is a good >>>> idea. >>>> > > > >>>> > > > >>>> > > > Best, >>>> > > > Yang >>>> > > > >>>> > > > >>>> > > > >>>> > > > >>>> > > > felixzheng zheng <felixzhen...@gmail.com> 于2020年2月21日周五 上午9:31写道: >>>> > > > >>>> > > > > Hi everyone, >>>> > > > > >>>> > > > > I would like to kick off a discussion on refactoring the >>>> existing >>>> > > > > Kubernetes resources construction architecture design. >>>> > > > > >>>> > > > > I created a design document [1] that clarifies our motivation >>>> to do >>>> > > this >>>> > > > > and some improvement proposals for the new design. >>>> > > > > >>>> > > > > Briefly, we would like to >>>> > > > > 1. Introduce a unified monadic-step based orchestrator >>>> architecture >>>> > > that >>>> > > > > has a better, cleaner and consistent abstraction for the >>>> Kubernetes >>>> > > > > resources construction process, both applicable to the client >>>> side >>>> > and >>>> > > > the >>>> > > > > master side. >>>> > > > > 2. Add some dedicated tools for centrally parsing, verifying, >>>> and >>>> > > > managing >>>> > > > > the Kubernetes parameters. >>>> > > > > >>>> > > > > It would be great to start the efforts before adding big >>>> features for >>>> > > the >>>> > > > > native Kubernetes submodule, and Tison and I plan to work >>>> together >>>> > for >>>> > > > this >>>> > > > > issue. >>>> > > > > >>>> > > > > Please let me know your thoughts. >>>> > > > > >>>> > > > > Regards, >>>> > > > > Canbin Zheng >>>> > > > > >>>> > > > > [1] >>>> > > > > >>>> > > > > >>>> > > > >>>> > > >>>> > >>>> https://docs.google.com/document/d/1dFBjqho8IRyNWxKVhFnTf0qGCsgp72aKON4wUdHY5Pg/edit?usp=sharing >>>> > > > > [2] https://issues.apache.org/jira/browse/FLINK-16194 >>>> > > > > >>>> > > > >>>> > > >>>> > >>>> >>>