Re: [DISCUSS] FLIP-250: Support Customized Kubernetes Schedulers Proposal

Yikun Jiang Thu, 14 Jul 2022 03:45:19 -0700

> And maybe we also could ping Yikun Jiang who has done similar things in
Spark.


Thanks for @wangyang ping. Yes, I was involved in Spark's customized
scheduler support work and as the main completer.

For customized scheduler support, I can share scheduler's requirement in
here:

1. Help scheduler to *specify* the scheduler name

2. Help scheduler to create the* scheduler related label/annotation/CRD*,
such as
- Yunikorn needs labels/annotations
<https://yunikorn.apache.org/docs/user_guide/labels_and_annotations_in_yunikorn/>
(maybe task group CRD in future or not)
- Volcano needs annotations and CRD <https://volcano.sh/en/docs/podgroup/>
- Kube-batch needs annotations/CRD
<https://github.com/kubernetes-sigs/kube-batch/tree/master/config/crds>
- Kueue needs annotation support
<https://github.com/kubernetes-sigs/kueue/blob/888cedb6e62c315e008916086308a893cd21dd66/config/samples/sample-job.yaml#L6>
and
cluster level CRD

3. Help the scheduler to create the scheduler meta/CRD at the* right time*,
such as if users want to avoid pod max pending, we need to create the
scheduler required CRD before pod creation.

For complex requirements, Spark uses featurestep to support (looks flink
decorators are very similar to it)
For simple requirements, they can just use configuration or Pod Template.
[1]
https://spark.apache.org/docs/latest/running-on-kubernetes.html#customized-kubernetes-schedulers-for-spark-on-kubernetes

>From the FLIP, I can see the above requirements are covered.

BTW, I think Flink decorators' existing and new added interface have
already covered all requirements of Kubernetes, so I personally think the
K8s related scheduler requirement can also be well covered by it.

Regards,
Yikun


On Thu, Jul 14, 2022 at 5:11 PM Yang Wang <danrtsey...@gmail.com> wrote:

> I think we could go over the customized scheduler plugin mechanism again
> with YuniKorn to make sure that it is common enough.
> But the implementation could be deferred.
>
> And maybe we also could ping Yikun Jiang who has done similar things in
> Spark.
>
> For the e2e tests, I admit that they could be improved. But I am not sure
> whether we really need the java implementation instead.
> This is out of the scope of this FLIP and let's keep the discussion
> under FLINK-20392.
>
>
> Best,
> Yang
>
> Martijn Visser <martijnvis...@apache.org> 于2022年7月14日周四 15:28写道：
>
> > Hi Bo,
> >
> > Thanks for the info! I think I see that you've already updated the FLIP
> to
> > reflect how customized schedulers are beneficial for both batch and
> > streaming jobs.
> >
> > The reason why I'm not too happy that we would only create a reference
> > implementation for Volcano is that we don't know if the generic support
> for
> > customized scheduler plugins will also work for others. We think it will,
> > but since there would be no other implementation available, we are not
> > sure. My concern is that when someone tries to add support for another
> > scheduler, we notice that we actually made a mistake or should improve
> the
> > generic support.
> >
> > Best regards,
> >
> > Martijn
> >
> >
> >
> > Op do 14 jul. 2022 om 05:30 schreef bo zhaobo <
> bzhaojyathousa...@gmail.com
> > >:
> >
> > > Hi Martijn,
> > >
> > > Thank you for your comments. I will answer the questions one by one.
> > >
> > > ""
> > > * Regarding the motivation, it mentions that the development trend is
> > that
> > > Flink supports both batch and stream processing. I think the vision and
> > > trend is that we have unified batch- and stream processing. What I'm
> > > missing is the vision on what's the impact for customized Kubernetes
> > > schedulers on stream processing. Could there be some elaboration on
> that?
> > > ""
> > >
> > > >>
> > >
> > > We very much agree with you and the dev trend that Flink supports both
> > > batch and stream processing. Actually, using the K8S customized
> scheduler
> > > is beneficial for streaming scenarios too, such as avoiding resource
> > > deadlock and other problems, for example, the remaining resources in
> the
> > > K8S cluster are only enough for one job running, but we submitted two.
> At
> > > this time, both jobs will be prevented and hang from requesting
> resources
> > > at the same time when using the default K8S scheduler, but in this
> case,
> > > the customized scheduler Volcano won’t schedule overcommit pods if the
> > idle
> > > can not fit all following pods setup. So the benefits mentioned in FLIP
> > are
> > > not only for batch jobs. In fact, the said 4 scheduling capabilities
> > > mentioned in FLIP are all required for stream processing. YARN has some
> > of
> > > those scheduling features too, such as priority scheduling, min/max
> > > resource constraint and etc...
> > >
> > > ""
> > > * While the FLIP talks about customized schedulers, it focuses on
> > Volcano.
> > > Why is the choice made to only focus on Volcano and not on other
> > schedulers
> > > like Apache YuniKorn? Can we not also provide an implementation for
> > > YuniKorn at the same time? Spark did the same with SPARK-36057 [1]
> > > ""
> > >
> > > >>
> > >
> > > Let's make it more clear about this. The FLIP consists of two parts:
> > > 1. Introducing Flink K8S supports the customized scheduler plugin
> > > mechanism. This aspect is a general consideration.
> > > 2. Introducing ONE reference implementation for the customized
> scheduler,
> > > volcano is just one of them, if other schedulers or people are
> > interested,
> > > the integration of other schedulers can also be easily completed.
> > >
> > > ""
> > > * We still have quite a lot of tech debt on testing for Kubernetes
> [2]. I
> > > think that this FLIP would be a great improvement for Flink, but I am
> > > worried that we will add more tech debt to those tests. Can we somehow
> > > improve this situation?
> > > ""
> > >
> > > >>
> > >
> > > Yeah, We will pay attention to the test problems, which are related to
> > > Flink K8S and we are happy to improve it. ;-)
> > >
> > > BR,
> > >
> > > Bo Zhao
> > >
> > > Martijn Visser <martijnvis...@apache.org> 于2022年7月13日周三 15:19写道：
> > >
> > > > Hi all,
> > > >
> > > > Thanks for the FLIP. I have a couple of remarks/questions:
> > > >
> > > > * Regarding the motivation, it mentions that the development trend is
> > > that
> > > > Flink supports both batch and stream processing. I think the vision
> and
> > > > trend is that we have unified batch- and stream processing. What I'm
> > > > missing is the vision on what's the impact for customized Kubernetes
> > > > schedulers on stream processing. Could there be some elaboration on
> > that?
> > > > * While the FLIP talks about customized schedulers, it focuses on
> > > Volcano.
> > > > Why is the choice made to only focus on Volcano and not on other
> > > schedulers
> > > > like Apache YuniKorn? Can we not also provide an implementation for
> > > > YuniKorn at the same time? Spark did the same with SPARK-36057 [1]
> > > > * We still have quite a lot of tech debt on testing for Kubernetes
> > [2]. I
> > > > think that this FLIP would be a great improvement for Flink, but I am
> > > > worried that we will add more tech debt to those tests. Can we
> somehow
> > > > improve this situation?
> > > >
> > > > Best regards,
> > > >
> > > > Martijn
> > > >
> > > > [1] https://issues.apache.org/jira/browse/SPARK-36057
> > > > [2] https://issues.apache.org/jira/browse/FLINK-20392
> > > >
> > > > Op wo 13 jul. 2022 om 04:11 schreef 王正 <cswangzh...@gmail.com>:
> > > >
> > > > > +1
> > > > >
> > > > > On 2022/07/07 01:15:13 bo zhaobo wrote:
> > > > > > Hi, all.
> > > > > >
> > > > > > I would like to raise a discussion in Flink dev ML about Support
> > > > > Customized
> > > > > > Kubernetes Schedulers.
> > > > > > Currentlly, Kubernetes becomes more and more polular for Flink
> > > Cluster
> > > > > > deployment, and its ability is better, especially, it supports
> > > > > customized
> > > > > > scheduling.
> > > > > > Essentially, in high-performance workloads, we need to apply new
> > > > > scheduling
> > > > > > policies for meeting the new requirements. And now Flink native
> > > > > Kubernetes
> > > > > > solution is using Kubernetes default scheduler to work with all
> > > > > scenarios,
> > > > > > the default scheduling policy might be difficult to apply in some
> > > > extreme
> > > > > > cases, so
> > > > > > we need to improve the Flink Kubernetes for coupling those
> > Kubernetes
> > > > > > customized schedulers with Flink native Kubernetes, provides a
> way
> > > for
> > > > > Flink
> > > > > > administrators or users to use/apply their Flink Clusters on
> > > Kubernetes
> > > > > > more flexibility.
> > > > > >
> > > > > > The proposal will introduce the customized K8S schdulers plugin
> > > > mechanism
> > > > > > and a reference implementation 'Volcano' in Flink. More details
> see
> > > > [1].
> > > > > >
> > > > > > Looking forward to your feedback.
> > > > > >
> > > > > > [1]
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-250%3A+Support+Customized+Kubernetes+Schedulers+Proposal
> > > > > >
> > > > > > Thanks,
> > > > > > BR
> > > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-250: Support Customized Kubernetes Schedulers Proposal

Reply via email to