Re: [DISCUSS] FLIP-250: Support Customized Kubernetes Schedulers Proposal

bo zhaobo Fri, 22 Jul 2022 01:41:59 -0700

Hi All,

Thanks for all feedbacks from you. All of them are helpful and valuable for
us.


If there is no further comment towards FLIP-250 we introduced, we plan to
setup a VOTE thread next Monday.

Thank you all !!

BR

Bo Zhao


bo zhaobo <bzhaojyathousa...@gmail.com> 于2022年7月15日周五 10:02写道：

> Thanks all, @Yang Wang and @Yikun Jiang.
>
> Hi Martijn,
>
> We understand your concern. And do the above emails clear your doubts?
>
> "
> Thanks for the info! I think I see that you've already updated the FLIP to
> reflect how customized schedulers are beneficial for both batch and
> streaming jobs.
> "
>
> >>>
>
> Yeah, that's true that the "Motivation" paragraph makes readers confused.
> So
> I updated the FLIP description. And thanks for your feedback and correct.
>
> "
> The reason why I'm not too happy that we would only create a reference
> implementation for Volcano is that we don't know if the generic support for
> customized scheduler plugins will also work for others. We think it will,
> but since there would be no other implementation available, we are not
> sure. My concern is that when someone tries to add support for another
> scheduler, we notice that we actually made a mistake or should improve the
> generic support.
> "
>
> >>>
>
> Yeah, I understand your concern. Via YiKun Jinag's description and
> experience sharing,
> does he make you know more? Or we need to figure out the common part of
> some popular
> K8S customized schedulers and refresh the doc? Waiting for your advice.
> ;-)
>
> Best regards,
>
> Bo Zhao
>
> Yikun Jiang <yikunk...@gmail.com> 于2022年7月14日周四 18:45写道：
>
>> > And maybe we also could ping Yikun Jiang who has done similar things in
>> Spark.
>>
>> Thanks for @wangyang ping. Yes, I was involved in Spark's customized
>> scheduler support work and as the main completer.
>>
>> For customized scheduler support, I can share scheduler's requirement in
>> here:
>>
>> 1. Help scheduler to *specify* the scheduler name
>>
>> 2. Help scheduler to create the* scheduler related label/annotation/CRD*,
>> such as
>> - Yunikorn needs labels/annotations
>> <
>> https://yunikorn.apache.org/docs/user_guide/labels_and_annotations_in_yunikorn/
>> >
>> (maybe task group CRD in future or not)
>> - Volcano needs annotations and CRD <https://volcano.sh/en/docs/podgroup/
>> >
>> - Kube-batch needs annotations/CRD
>> <https://github.com/kubernetes-sigs/kube-batch/tree/master/config/crds>
>> - Kueue needs annotation support
>> <
>> https://github.com/kubernetes-sigs/kueue/blob/888cedb6e62c315e008916086308a893cd21dd66/config/samples/sample-job.yaml#L6
>> >
>> and
>> cluster level CRD
>>
>> 3. Help the scheduler to create the scheduler meta/CRD at the* right
>> time*,
>> such as if users want to avoid pod max pending, we need to create the
>> scheduler required CRD before pod creation.
>>
>> For complex requirements, Spark uses featurestep to support (looks flink
>> decorators are very similar to it)
>> For simple requirements, they can just use configuration or Pod Template.
>> [1]
>>
>> https://spark.apache.org/docs/latest/running-on-kubernetes.html#customized-kubernetes-schedulers-for-spark-on-kubernetes
>>
>> From the FLIP, I can see the above requirements are covered.
>>
>> BTW, I think Flink decorators' existing and new added interface have
>> already covered all requirements of Kubernetes, so I personally think the
>> K8s related scheduler requirement can also be well covered by it.
>>
>> Regards,
>> Yikun
>>
>>
>> On Thu, Jul 14, 2022 at 5:11 PM Yang Wang <danrtsey...@gmail.com> wrote:
>>
>> > I think we could go over the customized scheduler plugin mechanism again
>> > with YuniKorn to make sure that it is common enough.
>> > But the implementation could be deferred.
>> >
>> > And maybe we also could ping Yikun Jiang who has done similar things in
>> > Spark.
>> >
>> > For the e2e tests, I admit that they could be improved. But I am not
>> sure
>> > whether we really need the java implementation instead.
>> > This is out of the scope of this FLIP and let's keep the discussion
>> > under FLINK-20392.
>> >
>> >
>> > Best,
>> > Yang
>> >
>> > Martijn Visser <martijnvis...@apache.org> 于2022年7月14日周四 15:28写道：
>> >
>> > > Hi Bo,
>> > >
>> > > Thanks for the info! I think I see that you've already updated the
>> FLIP
>> > to
>> > > reflect how customized schedulers are beneficial for both batch and
>> > > streaming jobs.
>> > >
>> > > The reason why I'm not too happy that we would only create a reference
>> > > implementation for Volcano is that we don't know if the generic
>> support
>> > for
>> > > customized scheduler plugins will also work for others. We think it
>> will,
>> > > but since there would be no other implementation available, we are not
>> > > sure. My concern is that when someone tries to add support for another
>> > > scheduler, we notice that we actually made a mistake or should improve
>> > the
>> > > generic support.
>> > >
>> > > Best regards,
>> > >
>> > > Martijn
>> > >
>> > >
>> > >
>> > > Op do 14 jul. 2022 om 05:30 schreef bo zhaobo <
>> > bzhaojyathousa...@gmail.com
>> > > >:
>> > >
>> > > > Hi Martijn,
>> > > >
>> > > > Thank you for your comments. I will answer the questions one by one.
>> > > >
>> > > > ""
>> > > > * Regarding the motivation, it mentions that the development trend
>> is
>> > > that
>> > > > Flink supports both batch and stream processing. I think the vision
>> and
>> > > > trend is that we have unified batch- and stream processing. What I'm
>> > > > missing is the vision on what's the impact for customized Kubernetes
>> > > > schedulers on stream processing. Could there be some elaboration on
>> > that?
>> > > > ""
>> > > >
>> > > > >>
>> > > >
>> > > > We very much agree with you and the dev trend that Flink supports
>> both
>> > > > batch and stream processing. Actually, using the K8S customized
>> > scheduler
>> > > > is beneficial for streaming scenarios too, such as avoiding resource
>> > > > deadlock and other problems, for example, the remaining resources in
>> > the
>> > > > K8S cluster are only enough for one job running, but we submitted
>> two.
>> > At
>> > > > this time, both jobs will be prevented and hang from requesting
>> > resources
>> > > > at the same time when using the default K8S scheduler, but in this
>> > case,
>> > > > the customized scheduler Volcano won’t schedule overcommit pods if
>> the
>> > > idle
>> > > > can not fit all following pods setup. So the benefits mentioned in
>> FLIP
>> > > are
>> > > > not only for batch jobs. In fact, the said 4 scheduling capabilities
>> > > > mentioned in FLIP are all required for stream processing. YARN has
>> some
>> > > of
>> > > > those scheduling features too, such as priority scheduling, min/max
>> > > > resource constraint and etc...
>> > > >
>> > > > ""
>> > > > * While the FLIP talks about customized schedulers, it focuses on
>> > > Volcano.
>> > > > Why is the choice made to only focus on Volcano and not on other
>> > > schedulers
>> > > > like Apache YuniKorn? Can we not also provide an implementation for
>> > > > YuniKorn at the same time? Spark did the same with SPARK-36057 [1]
>> > > > ""
>> > > >
>> > > > >>
>> > > >
>> > > > Let's make it more clear about this. The FLIP consists of two parts:
>> > > > 1. Introducing Flink K8S supports the customized scheduler plugin
>> > > > mechanism. This aspect is a general consideration.
>> > > > 2. Introducing ONE reference implementation for the customized
>> > scheduler,
>> > > > volcano is just one of them, if other schedulers or people are
>> > > interested,
>> > > > the integration of other schedulers can also be easily completed.
>> > > >
>> > > > ""
>> > > > * We still have quite a lot of tech debt on testing for Kubernetes
>> > [2]. I
>> > > > think that this FLIP would be a great improvement for Flink, but I
>> am
>> > > > worried that we will add more tech debt to those tests. Can we
>> somehow
>> > > > improve this situation?
>> > > > ""
>> > > >
>> > > > >>
>> > > >
>> > > > Yeah, We will pay attention to the test problems, which are related
>> to
>> > > > Flink K8S and we are happy to improve it. ;-)
>> > > >
>> > > > BR,
>> > > >
>> > > > Bo Zhao
>> > > >
>> > > > Martijn Visser <martijnvis...@apache.org> 于2022年7月13日周三 15:19写道：
>> > > >
>> > > > > Hi all,
>> > > > >
>> > > > > Thanks for the FLIP. I have a couple of remarks/questions:
>> > > > >
>> > > > > * Regarding the motivation, it mentions that the development
>> trend is
>> > > > that
>> > > > > Flink supports both batch and stream processing. I think the
>> vision
>> > and
>> > > > > trend is that we have unified batch- and stream processing. What
>> I'm
>> > > > > missing is the vision on what's the impact for customized
>> Kubernetes
>> > > > > schedulers on stream processing. Could there be some elaboration
>> on
>> > > that?
>> > > > > * While the FLIP talks about customized schedulers, it focuses on
>> > > > Volcano.
>> > > > > Why is the choice made to only focus on Volcano and not on other
>> > > > schedulers
>> > > > > like Apache YuniKorn? Can we not also provide an implementation
>> for
>> > > > > YuniKorn at the same time? Spark did the same with SPARK-36057 [1]
>> > > > > * We still have quite a lot of tech debt on testing for Kubernetes
>> > > [2]. I
>> > > > > think that this FLIP would be a great improvement for Flink, but
>> I am
>> > > > > worried that we will add more tech debt to those tests. Can we
>> > somehow
>> > > > > improve this situation?
>> > > > >
>> > > > > Best regards,
>> > > > >
>> > > > > Martijn
>> > > > >
>> > > > > [1] https://issues.apache.org/jira/browse/SPARK-36057
>> > > > > [2] https://issues.apache.org/jira/browse/FLINK-20392
>> > > > >
>> > > > > Op wo 13 jul. 2022 om 04:11 schreef 王正 <cswangzh...@gmail.com>:
>> > > > >
>> > > > > > +1
>> > > > > >
>> > > > > > On 2022/07/07 01:15:13 bo zhaobo wrote:
>> > > > > > > Hi, all.
>> > > > > > >
>> > > > > > > I would like to raise a discussion in Flink dev ML about
>> Support
>> > > > > > Customized
>> > > > > > > Kubernetes Schedulers.
>> > > > > > > Currentlly, Kubernetes becomes more and more polular for Flink
>> > > > Cluster
>> > > > > > > deployment, and its ability is better, especially, it supports
>> > > > > > customized
>> > > > > > > scheduling.
>> > > > > > > Essentially, in high-performance workloads, we need to apply
>> new
>> > > > > > scheduling
>> > > > > > > policies for meeting the new requirements. And now Flink
>> native
>> > > > > > Kubernetes
>> > > > > > > solution is using Kubernetes default scheduler to work with
>> all
>> > > > > > scenarios,
>> > > > > > > the default scheduling policy might be difficult to apply in
>> some
>> > > > > extreme
>> > > > > > > cases, so
>> > > > > > > we need to improve the Flink Kubernetes for coupling those
>> > > Kubernetes
>> > > > > > > customized schedulers with Flink native Kubernetes, provides a
>> > way
>> > > > for
>> > > > > > Flink
>> > > > > > > administrators or users to use/apply their Flink Clusters on
>> > > > Kubernetes
>> > > > > > > more flexibility.
>> > > > > > >
>> > > > > > > The proposal will introduce the customized K8S schdulers
>> plugin
>> > > > > mechanism
>> > > > > > > and a reference implementation 'Volcano' in Flink. More
>> details
>> > see
>> > > > > [1].
>> > > > > > >
>> > > > > > > Looking forward to your feedback.
>> > > > > > >
>> > > > > > > [1]
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-250%3A+Support+Customized+Kubernetes+Schedulers+Proposal
>> > > > > > >
>> > > > > > > Thanks,
>> > > > > > > BR
>> > > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>

Re: [DISCUSS] FLIP-250: Support Customized Kubernetes Schedulers Proposal

Reply via email to