Re: [DISCUSS] FLINK-16194: Refactor the Kubernetes architecture design

Yang Wang Mon, 24 Feb 2020 04:39:00 -0800

It seems that we could benefit a lot from the "Parameters Parser". Let's
start to
add the dedicated jobmanager/taskmanager config parser classes.



Best,
Yang

Canbin Zheng <felixzhen...@gmail.com> 于2020年2月24日周一 上午10:39写道：

> Hi, Yang Wang,
>
> Thanks for the feedback.
>
> > Parameters Parser
>
> I can think of many benefits of the dedicated parameter parsing/verifying
> tools, here is some of them:
> 1. Reuse the parameters parsing and verifying code.
> 2. Ensure consistent handling logic for the same setting.
> 3. Simplify some of the method's signature because we can avoid defining
> too many unnecessary input parameters.
>
> > BTW, the fabric8 kubernetes-client and Kubernetes api server already has
> the parameters check before starting to create the resources. I think the
> exceptions
> are usually meaning and enough for the users to get the root cause.
>
> 1. On the one hand, there are gaps between the declarative model used by
> Kubernetes and the configuration model used by Flink. Indeed, the
> Kubernetes client/server helps check the Kubernetes side parameters and
> throw exceptions in case of failures; however, the exception messages are
> not always easily understood from the perspective of Flink users.
> Independent of the solutions, what we should make sure is that the
> exception information in the Flink configuration model is meaningful enough
> for the Flink users.
> 2. On the other hand, some prechecking is beneficial in some scenarios
> since we can avoid useless effort on Kubernetes resource creation in case
> of failures leading to clean up all the created resources.
>
>
> Regards,
> Canbin Zheng
>
> Yang Wang <danrtsey...@gmail.com> 于2020年2月23日周日 下午10:37写道：
>
>> > The new introduced decorator
>> After some offline discussion with Canbin and tison, i totally understand
>> the evolved decorator design. Each decorator will be self-contained and
>> is responsible for just one thing. Currently, if we want to mount a new
>> config
>> file to jobmanager/taskmanager, then both `ConfigMapDecorator`,
>> `JobManagerDecorator`, `TaskManagerDecorator` needs to be updated.
>> It is not convenient for the new contributors to do this. In the new
>> design,
>> by leveraging the accompanying kubernetes resources in decorator, we
>> could finish the creating and mounting config map in one decorator.
>>
>> Since now we just have a basic implementation for native Kubernetes
>> integration and lots of features need to be developed. And many users
>> want to participate in and contribute to the integration. So i agree to
>> refactor
>> the current decorator implementation and make it easier for the new
>> contributor.
>>
>> For the detailed divergences(naming, etc.), i think we could discuss in
>> the PR.
>>
>> > Parameters Parser
>> Currently, the decorator directly use Flink configuration to get the
>> parameters
>> to build the Kubernetes resource. It is straightforward, however we could
>> not
>> have a unified parameters check. So i am not sure whether you will
>> introduce
>> a tool to check the parameters or just simply have our own basic check.
>>
>> BTW, the fabric8 kubernetes-client and Kubernetes api server already has
>> the parameters check before starting to create the resources. I think the
>> exceptions
>> are usually meaning and enough for the users to get the root cause.
>>
>>
>>
>> Best,
>> Yang
>>
>>
>>
>>
>> felixzheng zheng <felixzhen...@gmail.com> 于2020年2月22日周六 上午10:54写道：
>>
>>> Great thanks for the quick feedback Till. You are right; it is not a
>>> fundamentally different approach compared to
>>> what we have right now, all the Kubernetes resources created are the
>>> same,
>>> we aim to evolve the existing decorator approach so that,
>>> 1. the decorators are monadic and smaller in size and functionality.
>>> 2. the new decorator design allows reusing the decorators between the
>>> client and the cluster as much as possible.
>>> 3. all the decorators are independent with each other, and they could
>>> have
>>> arbitrary order in the chain, they share the same APIs and follow a
>>> unified
>>> orchestrator architecture so that new developers could quickly understand
>>> what should be done to introduce a new feature.
>>>
>>> Besides that, the new approach allows us adding tests for every decorator
>>> alone instead of doing a final test of all the decorators in the
>>> Fabric8ClientTest.java.
>>>
>>> Cheers,
>>> Canbin Zheng
>>>
>>> Till Rohrmann <trohrm...@apache.org> 于2020年2月22日周六 上午12:28写道：
>>>
>>> > Thanks for starting this discussion Canbin. If I understand your
>>> proposal
>>> > correctly, then you would like to evolve the existing decorator
>>> approach so
>>> > that decorators are monadic and smaller in size and functionality. The
>>> > latter aspect will allow to reuse them between the client and the
>>> cluster.
>>> > Just to make sure, it is not a fundamentally different approach
>>> compared to
>>> > what we have right now, is it?
>>> >
>>> > If this is the case, then I think it makes sense to reuse code as much
>>> as
>>> > possible and to create small code units which are easier to test.
>>> >
>>> > Cheers,
>>> > Till
>>> >
>>> > On Fri, Feb 21, 2020 at 4:41 PM felixzheng zheng <
>>> felixzhen...@gmail.com>
>>> > wrote:
>>> >
>>> > > Thanks for the feedback @Yang Wang. I would like to discuss some of
>>> the
>>> > > details in depth about why I am confused about the existing design.
>>> > >
>>> > > Question 1: How do we mount a configuration file?
>>> > >
>>> > > For the existing design,
>>> > >
>>> > >    1.
>>> > >
>>> > >    We need several classes to finish it:
>>> > >    1.
>>> > >
>>> > >       InitializerDecorator
>>> > >       2.
>>> > >
>>> > >       OwnerReferenceDecorator
>>> > >       3.
>>> > >
>>> > >       ConfigMapDecorator
>>> > >       4.
>>> > >
>>> > >       KubernetesUtils: providing the getConfigMapVolume method to
>>> share
>>> > for
>>> > >       the FlinkMasterDeploymentDecorator and the
>>> TaskManagerPodDecorator.
>>> > >       5.
>>> > >
>>> > >       FlinkMasterDeploymentDecorator: mounts the ConfigMap volume.
>>> > >       6.
>>> > >
>>> > >       TaskManagerPodDecorator: mounts the ConfigMap volume.
>>> > >       7.
>>> > >
>>> > >       If in the future, someone would like to introduce an init
>>> > Container,
>>> > >       the InitContainerDecorator has to mount the ConfigMap volume
>>> too.
>>> > >
>>> > >
>>> > > I am confused about the current solution to mounting a configuration
>>> > file:
>>> > >
>>> > >    1.
>>> > >
>>> > >    Actually, we do not need so many Decorators for mounting a file.
>>> > >    2.
>>> > >
>>> > >    If we would like to mount a new file, we have no choice but to
>>> repeat
>>> > >    the same tedious and scattered routine.
>>> > >    3.
>>> > >
>>> > >    There’s no easy way to test the file mounting functionality
>>> alone; we
>>> > >    have to construct the ConfigMap, the Deployment or the
>>> TaskManagerPod
>>> > > first
>>> > >    and then do a final test.
>>> > >
>>> > >
>>> > > The reason why it is so complex to mount a configuration file is
>>> that we
>>> > > don’t fully consider the internal connections among those resources
>>> in
>>> > the
>>> > > existing design.
>>> > >
>>> > > The new abstraction we proposed could solve such a kind of problem,
>>> the
>>> > new
>>> > > Decorator object is as follows:
>>> > >
>>> > > public interface KubernetesStepDecorator {
>>> > >
>>> > >   /**
>>> > >
>>> > >    * Apply transformations to the given FlinkPod in accordance with
>>> this
>>> > > feature. This can include adding
>>> > >
>>> > >    * labels/annotations, mounting volumes, and setting startup
>>> command or
>>> > > parameters, etc.
>>> > >
>>> > >    */
>>> > >
>>> > >   FlinkPod decorateFlinkPod(FlinkPod flinkPod);
>>> > >
>>> > >   /**
>>> > >
>>> > >    * Build the accompanying Kubernetes resources that should be
>>> > introduced
>>> > > to support this feature. This could
>>> > >
>>> > >    * only applicable to the client-side submission process.
>>> > >
>>> > >    */
>>> > >
>>> > >   List<HasMetadata> buildAccompanyingKubernetesResources() throws
>>> > > IOException;
>>> > >
>>> > > }
>>> > >
>>> > > The FlinkPod is a composition of the Pod, the main Container, the
>>> init
>>> > > Container, and the sidecar Container.
>>> > >
>>> > > Next, we introduce a KubernetesStepDecorator implementation, the
>>> method
>>> > of
>>> > > buildAccompanyingKubernetesResources creates the corresponding
>>> ConfigMap,
>>> > > and the method of decorateFlinkPod configures the Volume for the Pod
>>> and
>>> > > all the Containers.
>>> > >
>>> > > So, for the scenario of mounting a configuration file, the
>>> advantages of
>>> > > this new architecture are as follows:
>>> > >
>>> > >    1.
>>> > >
>>> > >    One dedicated KubernetesStepDecorator implementation is enough to
>>> > finish
>>> > >    all the mounting work, meanwhile, this class would be shared
>>> between
>>> > the
>>> > >    client-side and the master-side. Besides that, the number of
>>> lines of
>>> > > code
>>> > >    in that class will not exceed 300 lines which facilitate code
>>> > > readability
>>> > >    and maintenance.
>>> > >    2.
>>> > >
>>> > >    Testing becomes an easy thing now, as we can add a dedicated test
>>> > class
>>> > >    for only this class, the test would never rely on the
>>> construction of
>>> > > other
>>> > >    components such as the Deployment.
>>> > >    3.
>>> > >
>>> > >    It’s quite convenient to mount a new configuration file via the
>>> newly
>>> > >    dedicated KubernetesStepDecorator implementation.
>>> > >
>>> > >
>>> > > Question 2: How do we construct the Pod?
>>> > >
>>> > > For the existing design,
>>> > >
>>> > >    1.
>>> > >
>>> > >    The FlinkMasterDeploymentDecorator is responsible for building the
>>> > >    Deployment and configuring the Pod and the Containers, while the
>>> > >    TaskManagerPodDecorator is responsible for building the
>>> TaskManager
>>> > Pod.
>>> > >    Take FlinkMasterDeploymentDecorator as an example, let’s see what
>>> it
>>> > has
>>> > >    done.
>>> > >    1.
>>> > >
>>> > >       Configure the main Container, including the name, command,
>>> args,
>>> > >       image, image pull policy, resource requirements, ports, all
>>> kinds
>>> > of
>>> > >       environment variables, all kinds of volume mounts, etc.
>>> > >       2.
>>> > >
>>> > >       Configure the Pod, including service account, all kinds of
>>> volumes,
>>> > >       and attach all kinds of Container, including the main
>>> Container,
>>> > the
>>> > > init
>>> > >       Container, and the sidecar Container.
>>> > >       3.
>>> > >
>>> > >       Configure the Deployment.
>>> > >
>>> > >
>>> > >
>>> > >    1.
>>> > >
>>> > >    The InitializerDecorator and the OwnerReferenceDecorator have
>>> basic
>>> > >    logic so that the most complex work is completed in the
>>> > >    FlinkMasterDeploymentDecorator and the TaskManagerPodDecorator.
>>> With
>>> > the
>>> > >    introduction of new features for the Pod, such as customized
>>> volume
>>> > > mounts,
>>> > >    Hadoop configuration support, Kerberized HDFS support, secret
>>> mounts,
>>> > >    Python support, etc. the construction process could become far
>>> more
>>> > >    complicated, and the functionality of a single class could
>>> explode,
>>> > > which
>>> > >    hurts code readability, writability, and testability. Besides
>>> that,
>>> > both
>>> > >    the client-side and the master-side shares much of the Pod
>>> > construction
>>> > >    logic.
>>> > >
>>> > >
>>> > > So the problems are as follows:
>>> > >
>>> > >    1.
>>> > >
>>> > >    We don’t have a consistent abstraction that is applicable to both
>>> the
>>> > >    client-side and the master-side for Pod construction (including
>>> the
>>> > >    Containers) so that we have to share some of the code via tool
>>> classes
>>> > >    which is a trick solution, however, we can’t share most of the
>>> code as
>>> > > much
>>> > >    as possible.
>>> > >    2.
>>> > >
>>> > >    We don’t have a step-by-step orchestrator architecture to help
>>> the Pod
>>> > >    construction.
>>> > >
>>> > >
>>> > > For the proposed design,
>>> > >
>>> > >    1.
>>> > >
>>> > >    The problems above are all solved: we have a consistent
>>> abstraction
>>> > that
>>> > >    leads to a monadic-step orchestrator architecture to construct
>>> the Pod
>>> > > step
>>> > >    by step. One step is responsible for exactly one thing, following
>>> that
>>> > > is
>>> > >    the fact that every step is well testable, because each unit can
>>> be
>>> > >    considerably small, and the number of branches to test in any
>>> given
>>> > > step is
>>> > >    limited. Moreover, much of the step could be shared between the
>>> > > client-side
>>> > >    and the master-side, such as configuration files mount, etc.
>>> > >
>>> > >
>>> > > Question 3: Could all the resources share the same orchestrator
>>> > > architecture?
>>> > >
>>> > > For the existing design,
>>> > >
>>> > >    1.
>>> > >
>>> > >    We don’t have a unified orchestrator architecture for the
>>> construction
>>> > >    of all the Kubernetes resources, therefore, we need a Decorator
>>> chain
>>> > > for
>>> > >    every Kubernetes resource.
>>> > >
>>> > >
>>> > > For the proposed design,
>>> > >
>>> > >    1.
>>> > >
>>> > >    Following Question 1 ~ 2 and the design doc[1], we have
>>> introduced a
>>> > >    monadic-step orchestrator architecture to construct the Pod and
>>> all
>>> > the
>>> > >    Containers. Besides that, this architecture also works for the
>>> other
>>> > >    resources, such as the Services and the ConfigMaps. For example,
>>> if we
>>> > > need
>>> > >    a new Service or a ConfigMap, just introduce a new step.
>>> > >
>>> > >
>>> > >
>>> > > Question 4: How do we introduce the InitContainer or the side-car
>>> > > Container?
>>> > >
>>> > > For the existing design,
>>> > >
>>> > >    1.
>>> > >
>>> > >    Both the client-side and the master-side introduce some new
>>> > Decorators,
>>> > >
>>> > > the client-side chain could be:
>>> > >
>>> > > InitializerDecorator -> OwnerReferenceDecorator ->
>>> > > FlinkMasterDeploymentDecorator -> InitContainerDecorator ->
>>> > > SidecarDecorator -> etc
>>> > >
>>> > > -
>>> > >
>>> > > and the master-side could be:
>>> > >
>>> > > InitializerDecorator -> OwnerReferenceDecorator  ->
>>> > TaskManagerPodDecorator
>>> > > -> InitContainerDecorator -> SidecarDecorator -> etc
>>> > >
>>> > > As we can see, the FlinkMasterDeploymentDecorator or the
>>> > > TaskManagerPodDecorator is designed for the Pod construction,
>>> including
>>> > the
>>> > > Containers, so we don’t need to treat the init Container and the
>>> sidecar
>>> > > Container as special cases different from the main Container by
>>> > introducing
>>> > > a dedicated Decorator for every one of them. Such kind of trick
>>> solution
>>> > > confuses me, maybe to the other developers.
>>> > >
>>> > > For the proposed design,
>>> > >
>>> > >    1.
>>> > >
>>> > >    Following Question 1 ~ 4 and the design doc [1], the central
>>> premise
>>> > of
>>> > >    the proposed orchestrator architecture is that the Pod, the main
>>> > > Container,
>>> > >    the init Container, the sidecar Container, the Services, and the
>>> > > ConfigMaps
>>> > >    all sit on an equal footing. For every one of the Pod, the main
>>> > > Container,
>>> > >    the init Container and the sidecar Container, people could
>>> introduce
>>> > any
>>> > >    number of steps to finish its construction; we attach all the
>>> > > Containers to
>>> > >    the Pod in the Builder tool classes after the orchestrator
>>> > architecture
>>> > >    constructs them.
>>> > >
>>> > >
>>> > > Question 5: What is the relation between the Decorators?
>>> > >
>>> > > For the existing design,
>>> > >
>>> > >    1.
>>> > >
>>> > >    The Decorators are not independent; most of them have a strict
>>> order
>>> > in
>>> > >    the chain.
>>> > >    2.
>>> > >
>>> > >    The Decorators do not share common APIs in essence, as their input
>>> > type
>>> > >    could be different so that we can’t finish the construction of a
>>> > > Kubernetes
>>> > >    resource such as the Deployment and the TaskManager Pod through a
>>> > > one-time
>>> > >    traversal, there are boxing and unboxing overheads among some of
>>> the
>>> > >    neighboring Decorators.
>>> > >
>>> > >
>>> > > For the proposed design,
>>> > >
>>> > >    1.
>>> > >
>>> > >    The steps are completely independent so that they could have
>>> arbitrary
>>> > >    order in the chain; The only rule is that the Hadoop configuration
>>> > >    Decorator should be the last node in the chain.
>>> > >    2.
>>> > >
>>> > >    All the steps share common APIs, with the same input type of all
>>> the
>>> > >    methods, so we can construct a Kubernetes resource via a one-time
>>> > > traversal.
>>> > >
>>> > >
>>> > > Question 6: What is the number of Decorators chains?
>>> > >
>>> > > For the existing design,
>>> > >
>>> > >    1.
>>> > >
>>> > >    People have to introduce a new chain once they introduce a new
>>> > >    Kubernetes resource. For example, a new ConfigMap Decorator chain
>>> for
>>> > >    mounting a new configuration file. At this moment, we already have
>>> > five
>>> > >    Decorator chains.
>>> > >
>>> > >
>>> > > For the proposed design,
>>> > >
>>> > >    1.
>>> > >
>>> > >    There are always two chains, one is for constructing all the
>>> > Kubernetes
>>> > >    resources on the client-side, including the JobManager Pod, the
>>> > > Containers,
>>> > >    the ConfigMap(s), and the Services(s); the other one is for
>>> > constructing
>>> > >    the TaskManager Pod and the Containers.
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >
>>> > > Yang Wang <danrtsey...@gmail.com> 于2020年2月21日周五 下午2:05写道：
>>> > >
>>> > > >  Hi Canbing,
>>> > > >
>>> > > >
>>> > > > Thanks a lot for sharing your thoughts to improve the Flink on K8s
>>> > native
>>> > > > integration.
>>> > > > Frankly speaking, your discussion title confuses me and i am
>>> wondering
>>> > > > whether you
>>> > > > want to refactor the whole design. However, after i dive into the
>>> > details
>>> > > > and the provided
>>> > > > document, i realize that mostly you want to improve the
>>> implementation.
>>> > > >
>>> > > >
>>> > > > Regarding your two main points.
>>> > > >
>>> > > > >> Introduce a unified monadic-step based orchestrator architecture
>>> > that
>>> > > > has a better,
>>> > > > cleaner and consistent abstraction for the Kubernetes resources
>>> > > > construction process,
>>> > > > both applicable to the client side and the master side.
>>> > > >
>>> > > > When i introduce the decorator for the K8s in Flink, there is
>>> always a
>>> > > > guideline in my mind
>>> > > > that it should be easy for extension and adding new features. Just
>>> as
>>> > you
>>> > > > say, we have lots
>>> > > > of functions to support and the K8s is also evolving very fast. The
>>> > > current
>>> > > > `ConfigMapDecorator`,
>>> > > > `FlinkMasterDeploymentDecorator`, `TaskManagerPodDecorator` is a
>>> basic
>>> > > > implementation with
>>> > > > some prerequisite parameters. Of course we could chain more
>>> decorators
>>> > to
>>> > > > construct the K8s
>>> > > > resources. For example, InitializerDecorator ->
>>> OwnerReferenceDecorator
>>> > > ->
>>> > > > FlinkMasterDeploymentDecorator -> InitContainerDecorator ->
>>> > > > SidecarDecorator -> etc.
>>> > > >
>>> > > > I am little sceptical about splitting every parameter into a single
>>> > > > decorator.  Since it does
>>> > > > not take too much benefits. But i agree with moving some common
>>> > > parameters
>>> > > > into separate
>>> > > > decorators(e.g. volume mount). Also introducing the `~Builder`
>>> class
>>> > and
>>> > > > spinning off the chaining
>>> > > > decorator calls from `Fabric8FlinkKubeClient` make sense to me.
>>> > > >
>>> > > >
>>> > > >
>>> > > > >> Add some dedicated tools for centrally parsing, verifying, and
>>> > > managing
>>> > > > the Kubernetes parameters.
>>> > > >
>>> > > > Currently, we always get the parameters directly from Flink
>>> > > configuration(
>>> > > > e.g.
>>> > `flinkConfig.getString(KubernetesConfigOptions.CONTAINER_IMAGE)`). I
>>> > > > think it could be improved
>>> > > > by introducing some dedicated conf parser classes. It is a good
>>> idea.
>>> > > >
>>> > > >
>>> > > > Best,
>>> > > > Yang
>>> > > >
>>> > > >
>>> > > >
>>> > > >
>>> > > > felixzheng zheng <felixzhen...@gmail.com> 于2020年2月21日周五 上午9:31写道：
>>> > > >
>>> > > > > Hi everyone,
>>> > > > >
>>> > > > > I would like to kick off a discussion on refactoring the existing
>>> > > > > Kubernetes resources construction architecture design.
>>> > > > >
>>> > > > > I created a design document [1] that clarifies our motivation to
>>> do
>>> > > this
>>> > > > > and some improvement proposals for the new design.
>>> > > > >
>>> > > > > Briefly, we would like to
>>> > > > > 1. Introduce a unified monadic-step based orchestrator
>>> architecture
>>> > > that
>>> > > > > has a better, cleaner and consistent abstraction for the
>>> Kubernetes
>>> > > > > resources construction process, both applicable to the client
>>> side
>>> > and
>>> > > > the
>>> > > > > master side.
>>> > > > > 2. Add some dedicated tools for centrally parsing, verifying, and
>>> > > > managing
>>> > > > > the Kubernetes parameters.
>>> > > > >
>>> > > > > It would be great to start the efforts before adding big
>>> features for
>>> > > the
>>> > > > > native Kubernetes submodule, and Tison and I plan to work
>>> together
>>> > for
>>> > > > this
>>> > > > > issue.
>>> > > > >
>>> > > > > Please let me know your thoughts.
>>> > > > >
>>> > > > > Regards,
>>> > > > > Canbin Zheng
>>> > > > >
>>> > > > > [1]
>>> > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> https://docs.google.com/document/d/1dFBjqho8IRyNWxKVhFnTf0qGCsgp72aKON4wUdHY5Pg/edit?usp=sharing
>>> > > > > [2] https://issues.apache.org/jira/browse/FLINK-16194
>>> > > > >
>>> > > >
>>> > >
>>> >
>>>
>>

Re: [DISCUSS] FLINK-16194: Refactor the Kubernetes architecture design

Reply via email to