Do we want to move the SPIP forward to a vote? It seems like we're mostly agreeing in principle?
On Wed, Jan 5, 2022 at 11:12 AM Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > Hi Bo, > > Thanks for the info. Let me elaborate: > > In theory you can set the number of executors to multiple values of Nodes. > For example if you have a three node k8s cluster (in my case Google GKE), > you can set the number of executors to 6 and end up with six executors > queuing to start but ultimately you finish with two running executors plus > the driver in a 3 node cluster as shown below > > hduser@ctpvm: /home/hduser> k get pods -n spark > > NAME READY STATUS RESTARTS > AGE > > *randomdatabigquery-d42d067e2b91c88a-exec-1 1/1 Running 0 > 33s* > > *randomdatabigquery-d42d067e2b91c88a-exec-2 1/1 Running 0 > 33s* > > randomdatabigquery-d42d067e2b91c88a-exec-3 0/1 Pending 0 > 33s > > randomdatabigquery-d42d067e2b91c88a-exec-4 0/1 Pending 0 > 33s > > randomdatabigquery-d42d067e2b91c88a-exec-5 0/1 Pending 0 > 33s > > randomdatabigquery-d42d067e2b91c88a-exec-6 0/1 Pending 0 > 33s > > *sparkbq-0beda77e2b919e01-driver 1/1 Running 0 > 45s* > > hduser@ctpvm: /home/hduser> k get pods -n spark > > NAME READY STATUS RESTARTS > AGE > > randomdatabigquery-d42d067e2b91c88a-exec-1 1/1 Running 0 > 38s > > randomdatabigquery-d42d067e2b91c88a-exec-2 1/1 Running 0 > 38s > > sparkbq-0beda77e2b919e01-driver 1/1 Running 0 > 50s > > hduser@ctpvm: /home/hduser> k get pods -n spark > > *NAME READY STATUS RESTARTS > AGE* > > *randomdatabigquery-d42d067e2b91c88a-exec-1 1/1 Running 0 > 40s* > > *randomdatabigquery-d42d067e2b91c88a-exec-2 1/1 Running 0 > 40s* > > *sparkbq-0beda77e2b919e01-driver 1/1 Running 0 > 52s* > > So you end up with the three added executors dropping out. Hence the > conclusion seems to be you want to fit exactly one Spark executor pod per > Kubernetes node with the current model. > > HTH > > > view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Wed, 5 Jan 2022 at 17:01, bo yang <bobyan...@gmail.com> wrote: > >> Hi Mich, >> >> Curious what do you mean “The constraint seems to be that you can fit one >> Spark executor pod per Kubernetes node and from my tests you don't seem to >> be able to allocate more than 50% of RAM on the node to the container", >> Would you help to explain a bit? Asking this because there could be >> multiple executor pods running on a single Kuberentes node. >> >> Thanks, >> Bo >> >> >> On Wed, Jan 5, 2022 at 1:13 AM Mich Talebzadeh <mich.talebza...@gmail.com> >> wrote: >> >>> Thanks William for the info. >>> >>> >>> >>> >>> >>> The current model of Spark on k8s has certain drawbacks with pod based >>> scheduling as I tested it on Google Kubernetes Cluster (GKE). The >>> constraint seems to be that you can fit one Spark executor pod per >>> Kubernetes node and from my tests you don't seem to be able to allocate >>> more than 50% of RAM on the node to the container. >>> >>> >>> [image: gke_memoeyPlot.png] >>> >>> >>> Anymore results in the container never been created (stuck at pending) >>> >>> kubectl describe pod sparkbq-b506ac7dc521b667-driver -n spark >>> >>> Events: >>> >>> Type Reason Age From >>> Message >>> >>> ---- ------ ---- ---- >>> ------- >>> >>> Warning FailedScheduling 17m default-scheduler 0/3 >>> nodes are available: 3 Insufficient memory. >>> >>> Warning FailedScheduling 17m default-scheduler 0/3 >>> nodes are available: 3 Insufficient memory. >>> >>> Normal NotTriggerScaleUp 2m28s (x92 over 17m) cluster-autoscaler pod >>> didn't trigger scale-up: >>> >>> Obviously this is far from ideal and this model although works is not >>> efficient. >>> >>> >>> Cheers, >>> >>> >>> Mich >>> >>> >>> >>> >>> >>> >>> view my Linkedin profile >>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>> >>> >>> >>> >>> >>> >>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>> any loss, damage or destruction >>> >>> of data or any other property which may arise from relying on this >>> email's technical content is explicitly disclaimed. >>> >>> The author will in no case be liable for any monetary damages arising >>> from such >>> >>> loss, damage or destruction. >>> >>> >>> >>> >>> >>> >>> >>> >>> On Wed, 5 Jan 2022 at 03:55, William Wang <wang.platf...@gmail.com> >>> wrote: >>> >>>> Hi Mich, >>>> >>>> Here are parts of performance indications in Volcano. >>>> 1. Scheduler throughput: 1.5k pod/s (default scheduler: 100 Pod/s) >>>> 2. Spark application performance improved 30%+ with minimal resource >>>> reservation feature in case of insufficient resource.(tested with TPC-DS) >>>> >>>> We are still working on more optimizations. Besides the performance, >>>> Volcano is continuously enhanced in below four directions to provide >>>> abilities that users care about. >>>> - Full lifecycle management for jobs >>>> - Scheduling policies for high-performance workloads(fair-share, >>>> topology, sla, reservation, preemption, backfill etc) >>>> - Support for heterogeneous hardware >>>> - Performance optimization for high-performance workloads >>>> >>>> Thanks >>>> LeiBo >>>> >>>> Mich Talebzadeh <mich.talebza...@gmail.com> 于2022年1月4日周二 18:12写道: >>>> >>> Interesting,thanks >>>>> >>>>> Do you have any indication of the ballpark figure (a rough numerical >>>>> estimate) of adding Volcano as an alternative scheduler is going to >>>>> improve Spark on k8s performance? >>>>> >>>>> Thanks >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> view my Linkedin profile >>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>>>> any loss, damage or destruction >>>>> >>>>> of data or any other property which may arise from relying on this >>>>> email's technical content is explicitly disclaimed. >>>>> >>>>> The author will in no case be liable for any monetary damages arising >>>>> from such >>>>> >>>>> loss, damage or destruction. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Tue, 4 Jan 2022 at 09:43, Yikun Jiang <yikunk...@gmail.com> wrote: >>>>> >>>>>> Hi, folks! Wishing you all the best in 2022. >>>>>> >>>>>> I'd like to share the current status on "Support Customized K8S >>>>>> Scheduler in Spark". >>>>>> >>>>>> >>>>>> https://docs.google.com/document/d/1xgQGRpaHQX6-QH_J9YV2C2Dh6RpXefUpLM7KGkzL6Fg/edit#heading=h.1quyr1r2kr5n >>>>>> >>>>>> Framework/Common support >>>>>> >>>>>> - Volcano and Yunikorn team join the discussion and complete the >>>>>> initial doc on framework/common part. >>>>>> >>>>>> - SPARK-37145 <https://issues.apache.org/jira/browse/SPARK-37145> >>>>>> (under reviewing): We proposed to extend the customized scheduler by just >>>>>> using a custom feature step, it will meet the requirement of customized >>>>>> scheduler after it gets merged. After this, the user can enable >>>>>> featurestep >>>>>> and scheduler like: >>>>>> >>>>>> spark-submit \ >>>>>> >>>>>> --conf spark.kubernete.scheduler.name volcano \ >>>>>> >>>>>> --conf spark.kubernetes.driver.pod.featureSteps >>>>>> org.apache.spark.deploy.k8s.features.scheduler.VolcanoFeatureStep >>>>>> >>>>>> --conf spark.kubernete.job.queue xxx >>>>>> >>>>>> (such as above, the VolcanoFeatureStep will help to set the the spark >>>>>> scheduler queue according user specified conf) >>>>>> >>>>>> - SPARK-37331 <https://issues.apache.org/jira/browse/SPARK-37331>: >>>>>> Added the ability to create kubernetes resources before driver pod >>>>>> creation. >>>>>> >>>>>> - SPARK-36059 <https://issues.apache.org/jira/browse/SPARK-36059>: >>>>>> Add the ability to specify a scheduler in driver/executor >>>>>> >>>>>> After above all, the framework/common support would be ready for most >>>>>> of customized schedulers >>>>>> >>>>>> Volcano part: >>>>>> >>>>>> - SPARK-37258 <https://issues.apache.org/jira/browse/SPARK-37258>: >>>>>> Upgrade kubernetes-client to 5.11.1 to add volcano scheduler API support. >>>>>> >>>>>> - SPARK-36061 <https://issues.apache.org/jira/browse/SPARK-36061>: >>>>>> Add a VolcanoFeatureStep to help users to create a PodGroup with user >>>>>> specified minimum resources required, there is also a WIP commit to >>>>>> show the preview of this >>>>>> <https://github.com/Yikun/spark/pull/45/commits/81bf6f98edb5c00ebd0662dc172bc73f980b6a34> >>>>>> . >>>>>> >>>>>> Yunikorn part: >>>>>> >>>>>> - @WeiweiYang is completing the doc of the Yunikorn part and >>>>>> implementing the Yunikorn part. >>>>>> >>>>>> Regards, >>>>>> Yikun >>>>>> >>>>>> >>>>>> Weiwei Yang <w...@apache.org> 于2021年12月2日周四 02:00写道: >>>>>> >>>>>>> Thank you Yikun for the info, and thanks for inviting me to a >>>>>>> meeting to discuss this. >>>>>>> I appreciate your effort to put these together, and I agree that the >>>>>>> purpose is to make Spark easy/flexible enough to support other K8s >>>>>>> schedulers (not just for Volcano). >>>>>>> As discussed, could you please help to abstract out the things in >>>>>>> common and allow Spark to plug different implementations? I'd be happy >>>>>>> to >>>>>>> work with you guys on this issue. >>>>>>> >>>>>>> >>>>>>> On Tue, Nov 30, 2021 at 6:49 PM Yikun Jiang <yikunk...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> @Weiwei @Chenya >>>>>>>> >>>>>>>> > Thanks for bringing this up. This is quite interesting, we >>>>>>>> definitely should participate more in the discussions. >>>>>>>> >>>>>>>> Thanks for your reply and welcome to join the discussion, I think >>>>>>>> the input from Yunikorn is very critical. >>>>>>>> >>>>>>>> > The main thing here is, the Spark community should make Spark >>>>>>>> pluggable in order to support other schedulers, not just for Volcano. >>>>>>>> It >>>>>>>> looks like this proposal is pushing really hard for adopting PodGroup, >>>>>>>> which isn't part of K8s yet, that to me is problematic. >>>>>>>> >>>>>>>> Definitely yes, we are on the same page. >>>>>>>> >>>>>>>> I think we have the same goal: propose a general and reasonable >>>>>>>> mechanism to make spark on k8s with a custom scheduler more usable. >>>>>>>> >>>>>>>> But for the PodGroup, just allow me to do a brief introduction: >>>>>>>> - The PodGroup definition has been approved by Kubernetes >>>>>>>> officially in KEP-583. [1] >>>>>>>> - It can be regarded as a general concept/standard in Kubernetes >>>>>>>> rather than a specific concept in Volcano, there are also others to >>>>>>>> implement it, such as [2][3]. >>>>>>>> - Kubernetes recommends using CRD to do more extension to implement >>>>>>>> what they want. [4] >>>>>>>> - Volcano as extension provides an interface to maintain the life >>>>>>>> cycle PodGroup CRD and use volcano-scheduler to complete the >>>>>>>> scheduling. >>>>>>>> >>>>>>>> [1] >>>>>>>> https://github.com/kubernetes/enhancements/tree/master/keps/sig-scheduling/583-coscheduling >>>>>>>> [2] >>>>>>>> https://github.com/kubernetes-sigs/scheduler-plugins/tree/master/pkg/coscheduling#podgroup >>>>>>>> [3] https://github.com/kubernetes-sigs/kube-batch >>>>>>>> [4] >>>>>>>> https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/ >>>>>>>> >>>>>>>> Regards, >>>>>>>> Yikun >>>>>>>> >>>>>>>> >>>>>>>> Weiwei Yang <w...@apache.org> 于2021年12月1日周三 上午5:57写道: >>>>>>>> >>>>>>>>> Hi Chenya >>>>>>>>> >>>>>>>>> Thanks for bringing this up. This is quite interesting, we >>>>>>>>> definitely should participate more in the discussions. >>>>>>>>> The main thing here is, the Spark community should make Spark >>>>>>>>> pluggable in order to support other schedulers, not just for Volcano. >>>>>>>>> It >>>>>>>>> looks like this proposal is pushing really hard for adopting PodGroup, >>>>>>>>> which isn't part of K8s yet, that to me is problematic. >>>>>>>>> >>>>>>>>> On Tue, Nov 30, 2021 at 9:21 AM Prasad Paravatha < >>>>>>>>> prasad.parava...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> This is a great feature/idea. >>>>>>>>>> I'd love to get involved in some form (testing and/or >>>>>>>>>> documentation). This could be my 1st contribution to Spark! >>>>>>>>>> >>>>>>>>>> On Tue, Nov 30, 2021 at 10:46 PM John Zhuge <jzh...@apache.org> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> +1 Kudos to Yikun and the community for starting the discussion! >>>>>>>>>>> >>>>>>>>>>> On Tue, Nov 30, 2021 at 8:47 AM Chenya Zhang < >>>>>>>>>>> chenyazhangche...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Thanks folks for bringing up the topic of natively integrating >>>>>>>>>>>> Volcano and other alternative schedulers into Spark! >>>>>>>>>>>> >>>>>>>>>>>> +Weiwei, Wilfred, Chaoran. We would love to contribute to the >>>>>>>>>>>> discussion as well. >>>>>>>>>>>> >>>>>>>>>>>> From our side, we have been using and improving on one >>>>>>>>>>>> alternative resource scheduler, Apache YuniKorn ( >>>>>>>>>>>> https://yunikorn.apache.org/), for Spark on Kubernetes in >>>>>>>>>>>> production at Apple with solid results in the past year. It is >>>>>>>>>>>> capable of >>>>>>>>>>>> supporting Gang scheduling (similar to PodGroups), multi-tenant >>>>>>>>>>>> resource >>>>>>>>>>>> queues (similar to YARN), FIFO, and other handy features like bin >>>>>>>>>>>> packing >>>>>>>>>>>> to enable efficient autoscaling, etc. >>>>>>>>>>>> >>>>>>>>>>>> Natively integrating with Spark would provide more flexibility >>>>>>>>>>>> for users and reduce the extra cost and potential inconsistency of >>>>>>>>>>>> maintaining different layers of resource strategies. One >>>>>>>>>>>> interesting topic >>>>>>>>>>>> we hope to discuss more about is dynamic allocation, which would >>>>>>>>>>>> benefit >>>>>>>>>>>> from native coordination between Spark and resource schedulers in >>>>>>>>>>>> K8s & >>>>>>>>>>>> cloud environment for an optimal resource efficiency. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Nov 30, 2021 at 8:10 AM Holden Karau < >>>>>>>>>>>> hol...@pigscanfly.ca> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Thanks for putting this together, I’m really excited for us to >>>>>>>>>>>>> add better batch scheduling integrations. >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, Nov 30, 2021 at 12:46 AM Yikun Jiang < >>>>>>>>>>>>> yikunk...@gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hey everyone, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I'd like to start a discussion on "Support >>>>>>>>>>>>>> Volcano/Alternative Schedulers Proposal". >>>>>>>>>>>>>> >>>>>>>>>>>>>> This SPIP is proposed to make spark k8s schedulers provide >>>>>>>>>>>>>> more YARN like features (such as queues and minimum resources >>>>>>>>>>>>>> before >>>>>>>>>>>>>> scheduling jobs) that many folks want on Kubernetes. >>>>>>>>>>>>>> >>>>>>>>>>>>>> The goal of this SPIP is to improve current spark k8s >>>>>>>>>>>>>> scheduler implementations, add the ability of batch scheduling >>>>>>>>>>>>>> and support >>>>>>>>>>>>>> volcano as one of implementations. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Design doc: >>>>>>>>>>>>>> https://docs.google.com/document/d/1xgQGRpaHQX6-QH_J9YV2C2Dh6RpXefUpLM7KGkzL6Fg >>>>>>>>>>>>>> JIRA: https://issues.apache.org/jira/browse/SPARK-36057 >>>>>>>>>>>>>> Part of PRs: >>>>>>>>>>>>>> Ability to create resources >>>>>>>>>>>>>> https://github.com/apache/spark/pull/34599 >>>>>>>>>>>>>> Add PodGroupFeatureStep: >>>>>>>>>>>>>> https://github.com/apache/spark/pull/34456 >>>>>>>>>>>>>> >>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>> Yikun >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>> Twitter: https://twitter.com/holdenkarau >>>>>>>>>>>>> Books (Learning Spark, High Performance Spark, etc.): >>>>>>>>>>>>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>>>>>>>>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> John Zhuge >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Regards, >>>>>>>>>> Prasad Paravatha >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> -- Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> YouTube Live Streams: https://www.youtube.com/user/holdenkarau