Re: Dynamic allocation on K8

2022-10-26 Thread Shrikant Prasad
Hi Nikhil, Spark on Kubernetes supports dynamic allocation using shuffle tracking feature instead of the external shuffle service. In order to enable dynamic allocation, you should set these two configs as true: spark.dynamicAllocation.enabled and spark.dynamicAllocation.shuffleTracking.enabled R

Re: Dynamic Scaling without Kubernetes

2022-10-26 Thread Artemis User
Wouldn't you need to run Spark on Hadoop in order to use YARN?  I believe that YARN only manages Hadoop nodes, not Spark workers directly.  Besides, what I read was that you would need some extra plug-ins to be able to get nodes managed dynamically. Our use case would be like this: 1. A Spark

Re: Dynamic Scaling without Kubernetes

2022-10-26 Thread Holden Karau
So Spark can dynamically scale on YARN, but standalone mode becomes a bit complicated — where do you envision Spark gets the extra resources from? On Wed, Oct 26, 2022 at 12:18 PM Artemis User wrote: > Has anyone tried to make a Spark cluster dynamically scalable, i.e., > adding a new worker nod

Dynamic Scaling without Kubernetes

2022-10-26 Thread Artemis User
Has anyone tried to make a Spark cluster dynamically scalable, i.e., adding a new worker node automatically to the cluster when no more executors are available upon a new job submitted?  We need to make the whole cluster on-prem and really lightweight, so standalone mode is preferred and no k8s

Re: Running 30 Spark applications at the same time is slower than one on average

2022-10-26 Thread Sean Owen
That just means G = GB mem, C = cores, but yeah the driver and executors are very small, possibly related. On Wed, Oct 26, 2022 at 12:34 PM Artemis User wrote: > Are these Cloudera specific acronyms? Not sure how Cloudera configures > Spark differently, but obviously the number of nodes is too

Re: Running 30 Spark applications at the same time is slower than one on average

2022-10-26 Thread Artemis User
Are these Cloudera specific acronyms?  Not sure how Cloudera configures Spark differently, but obviously the number of nodes is too small, considering each app only uses a small number of cores and RAM.  So you may consider increase the number of nodes.   When all these apps jam on a few nodes,

Re: [ANNOUNCE] Apache Spark 3.3.1 released

2022-10-26 Thread Chao Sun
Congrats everyone! and thanks Yuming for driving the release! On Wed, Oct 26, 2022 at 7:37 AM beliefer wrote: > > Congratulations everyone have contributed to this release. > > > At 2022-10-26 14:21:36, "Yuming Wang" wrote: > > We are happy to announce the availability of Apache Spark 3.3.1! > >

Re:[ANNOUNCE] Apache Spark 3.3.1 released

2022-10-26 Thread beliefer
Congratulations everyone have contributed to this release. At 2022-10-26 14:21:36, "Yuming Wang" wrote: We are happy to announce the availability of Apache Spark 3.3.1! Spark 3.3.1 is a maintenance release containing stability fixes. This release is based on the branch-3.3 maintenance branch

Re: Running 30 Spark applications at the same time is slower than one on average

2022-10-26 Thread Sean Owen
Resource contention. Now all the CPU and I/O is competing and probably slows down On Wed, Oct 26, 2022, 5:37 AM eab...@163.com wrote: > Hi All, > > I have a CDH5.16.2 hadoop cluster with 1+3 nodes(64C/128G, 1NN/RM + > 3DN/NM), and yarn with 192C/240G. I used the following test scenario: > > 1.sp

Running 30 Spark applications at the same time is slower than one on average

2022-10-26 Thread eab...@163.com
Hi All, I have a CDH5.16.2 hadoop cluster with 1+3 nodes(64C/128G, 1NN/RM + 3DN/NM), and yarn with 192C/240G. I used the following test scenario: 1.spark app resource with 2G driver memory/2C driver vcore/1 executor nums/2G executor memory/2C executor vcore. 2.one spark app will use 5G4C on yar

Re: [ANNOUNCE] Apache Spark 3.3.1 released

2022-10-26 Thread Jacek Laskowski
Yoohoo! Thanks Yuming for driving this release. A tiny step for Spark a huge one for my clients (who still are on 3.2.1 or even older :)) Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books Follow me on https://twitter.com/jac

Re: [ANNOUNCE] Apache Spark 3.3.1 released

2022-10-26 Thread Yang,Jie(INF)
Thanks Yuming and all developers ~ Yang Jie 发件人: Maxim Gekk 日期: 2022年10月26日 星期三 15:19 收件人: Hyukjin Kwon 抄送: "L. C. Hsieh" , Dongjoon Hyun , Yuming Wang , dev , User 主题: Re: [ANNOUNCE] Apache Spark 3.3.1 released Congratulations everyone with the new release, and thanks to Yuming for his ef

Re: [ANNOUNCE] Apache Spark 3.3.1 released

2022-10-26 Thread Maxim Gekk
Congratulations everyone with the new release, and thanks to Yuming for his efforts. Maxim Gekk Software Engineer Databricks, Inc. On Wed, Oct 26, 2022 at 10:14 AM Hyukjin Kwon wrote: > Thanks, Yuming. > > On Wed, 26 Oct 2022 at 16:01, L. C. Hsieh wrote: > >> Thank you for driving the relea

Re: [ANNOUNCE] Apache Spark 3.3.1 released

2022-10-26 Thread Hyukjin Kwon
Thanks, Yuming. On Wed, 26 Oct 2022 at 16:01, L. C. Hsieh wrote: > Thank you for driving the release of Apache Spark 3.3.1, Yuming! > > On Tue, Oct 25, 2022 at 11:38 PM Dongjoon Hyun > wrote: > > > > It's great. Thank you so much, Yuming! > > > > Dongjoon > > > > On Tue, Oct 25, 2022 at 11:23 P

Re: [ANNOUNCE] Apache Spark 3.3.1 released

2022-10-26 Thread L. C. Hsieh
Thank you for driving the release of Apache Spark 3.3.1, Yuming! On Tue, Oct 25, 2022 at 11:38 PM Dongjoon Hyun wrote: > > It's great. Thank you so much, Yuming! > > Dongjoon > > On Tue, Oct 25, 2022 at 11:23 PM Yuming Wang wrote: >> >> We are happy to announce the availability of Apache Spark 3