Re: [VOTE] SPIP: Declarative Pipelines

2025-04-11 Thread John Zhuge
+1 (non-binding) On Fri, Apr 11, 2025 at 3:47 AM Ruifeng Zheng wrote: > +1 > > On Fri, Apr 11, 2025 at 12:37 PM Walaa Eldin Moustafa < > wa.moust...@gmail.com> wrote: > >> +1 (non-binding) >> >> On Thu, Apr 10, 2025 at 6:52 PM Liu Cao wrote: >> >>> +1 (non-binding) >>> >>> On Thu, Apr 10, 2025

Re: SPARK ON KUBERNETS IS SLOW

2025-04-11 Thread karan alang
{ "emoji": "👍", "version": 1 }

Re: SPARK ON KUBERNETS IS SLOW

2025-04-11 Thread Prem Sahoo
Thanks Karan for input but found  the issue of slowness in kubernetes while doing broadcast it takes 2 to 3 times than YARN . need to check why such a big difference. In our case we are doing around 40 tables manual broadcast as the size of tables is more than 10 mb. What can be done in kubernetes

Re: SPARK ON KUBERNETS IS SLOW

2025-04-11 Thread karan alang
Pls check if there are resource constraints on the pods/nodes especially if they are shared. MinIO connectivity performance needs to be checked. With YARN and External Spark Shuffle, the sparkshuffle is a lot more optimized, so we can experience slowness with spark on k8s, especially if there is a

Re: SPARK ON KUBERNETS IS SLOW

2025-04-11 Thread Prem Sahoo
Hello Karan,I am using Spark open source in kubernetes and Spark mapr bundle in YARN.For launching job in both approach it takes same 10 secs .For shuffle I am using local in both yarn and kubernetes.Sent from my iPhoneOn Apr 11, 2025, at 11:24 AM, karan alang wrote:Hi Prem,Which distribution of

Re: SPARK ON KUBERNETS IS SLOW

2025-04-11 Thread karan alang
Hi Prem, Which distribution of Spark are you using ? how long does it take to launch the job ? wrt Spark Shuffle, what is the approach you are using - storing shuffle data in MinIO or using host path ? regds, Karan On Fri, Apr 11, 2025 at 4:58 AM Prem Sahoo wrote: > Hello Team, > I have a pecu

subscribe

2025-04-11 Thread bin zhou

SPARK ON KUBERNETS IS SLOW

2025-04-11 Thread Prem Sahoo
Hello Team, I have a peculiar case of Spark slowness. I am using Minio as Object storage from where Spark reads & writes data. I am using YARN as Master and executing a Spark job which takes ~5mins the same job when run with Kubernetes as Master it takes ~8 mins . I checked the Spark DAG in both a

Re: [VOTE] SPIP: Declarative Pipelines

2025-04-11 Thread Ruifeng Zheng
+1 On Fri, Apr 11, 2025 at 12:37 PM Walaa Eldin Moustafa wrote: > +1 (non-binding) > > On Thu, Apr 10, 2025 at 6:52 PM Liu Cao wrote: > >> +1 (non-binding) >> >> On Thu, Apr 10, 2025 at 9:51 AM Prashant Singh >> wrote: >> >>> +1 (non-binding) >>> >>> On Thu, Apr 10, 2025 at 9:46 AM Xiao Li wr