Re: [DISCUSS] FLINK-16194: Refactor the Kubernetes architecture design

felixzheng zheng Fri, 21 Feb 2020 07:42:23 -0800

Thanks for the feedback @Yang Wang. I would like to discuss some of the
details in depth about why I am confused about the existing design.


Question 1: How do we mount a configuration file?

For the existing design,

   1.

   We need several classes to finish it:
   1.

      InitializerDecorator
      2.

      OwnerReferenceDecorator
      3.

      ConfigMapDecorator
      4.

      KubernetesUtils: providing the getConfigMapVolume method to share for
      the FlinkMasterDeploymentDecorator and the TaskManagerPodDecorator.
      5.

      FlinkMasterDeploymentDecorator: mounts the ConfigMap volume.
      6.

      TaskManagerPodDecorator: mounts the ConfigMap volume.
      7.

      If in the future, someone would like to introduce an init Container,
      the InitContainerDecorator has to mount the ConfigMap volume too.


I am confused about the current solution to mounting a configuration file:

   1.

   Actually, we do not need so many Decorators for mounting a file.
   2.

   If we would like to mount a new file, we have no choice but to repeat
   the same tedious and scattered routine.
   3.

   There’s no easy way to test the file mounting functionality alone; we
   have to construct the ConfigMap, the Deployment or the TaskManagerPod first
   and then do a final test.


The reason why it is so complex to mount a configuration file is that we
don’t fully consider the internal connections among those resources in the
existing design.

The new abstraction we proposed could solve such a kind of problem, the new
Decorator object is as follows:

public interface KubernetesStepDecorator {

  /**

   * Apply transformations to the given FlinkPod in accordance with this
feature. This can include adding

   * labels/annotations, mounting volumes, and setting startup command or
parameters, etc.

   */

  FlinkPod decorateFlinkPod(FlinkPod flinkPod);

  /**

   * Build the accompanying Kubernetes resources that should be introduced
to support this feature. This could

   * only applicable to the client-side submission process.

   */

  List<HasMetadata> buildAccompanyingKubernetesResources() throws
IOException;

}

The FlinkPod is a composition of the Pod, the main Container, the init
Container, and the sidecar Container.

Next, we introduce a KubernetesStepDecorator implementation, the method of
buildAccompanyingKubernetesResources creates the corresponding ConfigMap,
and the method of decorateFlinkPod configures the Volume for the Pod and
all the Containers.

So, for the scenario of mounting a configuration file, the advantages of
this new architecture are as follows:

   1.

   One dedicated KubernetesStepDecorator implementation is enough to finish
   all the mounting work, meanwhile, this class would be shared between the
   client-side and the master-side. Besides that, the number of lines of code
   in that class will not exceed 300 lines which facilitate code readability
   and maintenance.
   2.

   Testing becomes an easy thing now, as we can add a dedicated test class
   for only this class, the test would never rely on the construction of other
   components such as the Deployment.
   3.

   It’s quite convenient to mount a new configuration file via the newly
   dedicated KubernetesStepDecorator implementation.


Question 2: How do we construct the Pod?

For the existing design,

   1.

   The FlinkMasterDeploymentDecorator is responsible for building the
   Deployment and configuring the Pod and the Containers, while the
   TaskManagerPodDecorator is responsible for building the TaskManager Pod.
   Take FlinkMasterDeploymentDecorator as an example, let’s see what it has
   done.
   1.

      Configure the main Container, including the name, command, args,
      image, image pull policy, resource requirements, ports, all kinds of
      environment variables, all kinds of volume mounts, etc.
      2.

      Configure the Pod, including service account, all kinds of volumes,
      and attach all kinds of Container, including the main Container, the init
      Container, and the sidecar Container.
      3.

      Configure the Deployment.



   1.

   The InitializerDecorator and the OwnerReferenceDecorator have basic
   logic so that the most complex work is completed in the
   FlinkMasterDeploymentDecorator and the TaskManagerPodDecorator. With the
   introduction of new features for the Pod, such as customized volume mounts,
   Hadoop configuration support, Kerberized HDFS support, secret mounts,
   Python support, etc. the construction process could become far more
   complicated, and the functionality of a single class could explode, which
   hurts code readability, writability, and testability. Besides that, both
   the client-side and the master-side shares much of the Pod construction
   logic.


So the problems are as follows:

   1.

   We don’t have a consistent abstraction that is applicable to both the
   client-side and the master-side for Pod construction (including the
   Containers) so that we have to share some of the code via tool classes
   which is a trick solution, however, we can’t share most of the code as much
   as possible.
   2.

   We don’t have a step-by-step orchestrator architecture to help the Pod
   construction.


For the proposed design,

   1.

   The problems above are all solved: we have a consistent abstraction that
   leads to a monadic-step orchestrator architecture to construct the Pod step
   by step. One step is responsible for exactly one thing, following that is
   the fact that every step is well testable, because each unit can be
   considerably small, and the number of branches to test in any given step is
   limited. Moreover, much of the step could be shared between the client-side
   and the master-side, such as configuration files mount, etc.


Question 3: Could all the resources share the same orchestrator
architecture?

For the existing design,

   1.

   We don’t have a unified orchestrator architecture for the construction
   of all the Kubernetes resources, therefore, we need a Decorator chain for
   every Kubernetes resource.


For the proposed design,

   1.

   Following Question 1 ~ 2 and the design doc[1], we have introduced a
   monadic-step orchestrator architecture to construct the Pod and all the
   Containers. Besides that, this architecture also works for the other
   resources, such as the Services and the ConfigMaps. For example, if we need
   a new Service or a ConfigMap, just introduce a new step.



Question 4: How do we introduce the InitContainer or the side-car Container?

For the existing design,

   1.

   Both the client-side and the master-side introduce some new Decorators,

the client-side chain could be:

InitializerDecorator -> OwnerReferenceDecorator ->
FlinkMasterDeploymentDecorator -> InitContainerDecorator ->
SidecarDecorator -> etc

-

and the master-side could be:

InitializerDecorator -> OwnerReferenceDecorator  -> TaskManagerPodDecorator
-> InitContainerDecorator -> SidecarDecorator -> etc

As we can see, the FlinkMasterDeploymentDecorator or the
TaskManagerPodDecorator is designed for the Pod construction, including the
Containers, so we don’t need to treat the init Container and the sidecar
Container as special cases different from the main Container by introducing
a dedicated Decorator for every one of them. Such kind of trick solution
confuses me, maybe to the other developers.

For the proposed design,

   1.

   Following Question 1 ~ 4 and the design doc [1], the central premise of
   the proposed orchestrator architecture is that the Pod, the main Container,
   the init Container, the sidecar Container, the Services, and the ConfigMaps
   all sit on an equal footing. For every one of the Pod, the main Container,
   the init Container and the sidecar Container, people could introduce any
   number of steps to finish its construction; we attach all the Containers to
   the Pod in the Builder tool classes after the orchestrator architecture
   constructs them.


Question 5: What is the relation between the Decorators?

For the existing design,

   1.

   The Decorators are not independent; most of them have a strict order in
   the chain.
   2.

   The Decorators do not share common APIs in essence, as their input type
   could be different so that we can’t finish the construction of a Kubernetes
   resource such as the Deployment and the TaskManager Pod through a one-time
   traversal, there are boxing and unboxing overheads among some of the
   neighboring Decorators.


For the proposed design,

   1.

   The steps are completely independent so that they could have arbitrary
   order in the chain; The only rule is that the Hadoop configuration
   Decorator should be the last node in the chain.
   2.

   All the steps share common APIs, with the same input type of all the
   methods, so we can construct a Kubernetes resource via a one-time traversal.


Question 6: What is the number of Decorators chains?

For the existing design,

   1.

   People have to introduce a new chain once they introduce a new
   Kubernetes resource. For example, a new ConfigMap Decorator chain for
   mounting a new configuration file. At this moment, we already have five
   Decorator chains.


For the proposed design,

   1.

   There are always two chains, one is for constructing all the Kubernetes
   resources on the client-side, including the JobManager Pod, the Containers,
   the ConfigMap(s), and the Services(s); the other one is for constructing
   the TaskManager Pod and the Containers.







Yang Wang <[email protected]> 于2020年2月21日周五 下午2:05写道：

>  Hi Canbing,
>
>
> Thanks a lot for sharing your thoughts to improve the Flink on K8s native
> integration.
> Frankly speaking, your discussion title confuses me and i am wondering
> whether you
> want to refactor the whole design. However, after i dive into the details
> and the provided
> document, i realize that mostly you want to improve the implementation.
>
>
> Regarding your two main points.
>
> >> Introduce a unified monadic-step based orchestrator architecture that
> has a better,
> cleaner and consistent abstraction for the Kubernetes resources
> construction process,
> both applicable to the client side and the master side.
>
> When i introduce the decorator for the K8s in Flink, there is always a
> guideline in my mind
> that it should be easy for extension and adding new features. Just as you
> say, we have lots
> of functions to support and the K8s is also evolving very fast. The current
> `ConfigMapDecorator`,
> `FlinkMasterDeploymentDecorator`, `TaskManagerPodDecorator` is a basic
> implementation with
> some prerequisite parameters. Of course we could chain more decorators to
> construct the K8s
> resources. For example, InitializerDecorator -> OwnerReferenceDecorator ->
> FlinkMasterDeploymentDecorator -> InitContainerDecorator ->
> SidecarDecorator -> etc.
>
> I am little sceptical about splitting every parameter into a single
> decorator.  Since it does
> not take too much benefits. But i agree with moving some common parameters
> into separate
> decorators(e.g. volume mount). Also introducing the `~Builder` class and
> spinning off the chaining
> decorator calls from `Fabric8FlinkKubeClient` make sense to me.
>
>
>
> >> Add some dedicated tools for centrally parsing, verifying, and managing
> the Kubernetes parameters.
>
> Currently, we always get the parameters directly from Flink configuration(
> e.g. `flinkConfig.getString(KubernetesConfigOptions.CONTAINER_IMAGE)`). I
> think it could be improved
> by introducing some dedicated conf parser classes. It is a good idea.
>
>
> Best,
> Yang
>
>
>
>
> felixzheng zheng <[email protected]> 于2020年2月21日周五 上午9:31写道：
>
> > Hi everyone,
> >
> > I would like to kick off a discussion on refactoring the existing
> > Kubernetes resources construction architecture design.
> >
> > I created a design document [1] that clarifies our motivation to do this
> > and some improvement proposals for the new design.
> >
> > Briefly, we would like to
> > 1. Introduce a unified monadic-step based orchestrator architecture that
> > has a better, cleaner and consistent abstraction for the Kubernetes
> > resources construction process, both applicable to the client side and
> the
> > master side.
> > 2. Add some dedicated tools for centrally parsing, verifying, and
> managing
> > the Kubernetes parameters.
> >
> > It would be great to start the efforts before adding big features for the
> > native Kubernetes submodule, and Tison and I plan to work together for
> this
> > issue.
> >
> > Please let me know your thoughts.
> >
> > Regards,
> > Canbin Zheng
> >
> > [1]
> >
> >
> https://docs.google.com/document/d/1dFBjqho8IRyNWxKVhFnTf0qGCsgp72aKON4wUdHY5Pg/edit?usp=sharing
> > [2] https://issues.apache.org/jira/browse/FLINK-16194
> >
>

Re: [DISCUSS] FLINK-16194: Refactor the Kubernetes architecture design

Reply via email to