Re: SPARK ON KUBERNETS IS SLOW

2025-04-11 Thread karan alang
{ "emoji": "👍", "version": 1 }

Re: SPARK ON KUBERNETS IS SLOW

2025-04-11 Thread Prem Sahoo
Thanks Karan for input but found  the issue of slowness in kubernetes while doing broadcast it takes 2 to 3 times than YARN . need to check why such a big difference. In our case we are doing around 40 tables manual broadcast as the size of tables is more than 10 mb. What can be done in kubernetes

Re: SPARK ON KUBERNETS IS SLOW

2025-04-11 Thread karan alang
Pls check if there are resource constraints on the pods/nodes especially if they are shared. MinIO connectivity performance needs to be checked. With YARN and External Spark Shuffle, the sparkshuffle is a lot more optimized, so we can experience slowness with spark on k8s, especially if there is a

Re: SPARK ON KUBERNETS IS SLOW

2025-04-11 Thread Prem Sahoo
Hello Karan,I am using Spark open source in kubernetes and Spark mapr bundle in YARN.For launching job in both approach it takes same 10 secs .For shuffle I am using local in both yarn and kubernetes.Sent from my iPhoneOn Apr 11, 2025, at 11:24 AM, karan alang wrote:Hi Prem,Which distribution of

Re: SPARK ON KUBERNETS IS SLOW

2025-04-11 Thread karan alang
Hi Prem, Which distribution of Spark are you using ? how long does it take to launch the job ? wrt Spark Shuffle, what is the approach you are using - storing shuffle data in MinIO or using host path ? regds, Karan On Fri, Apr 11, 2025 at 4:58 AM Prem Sahoo wrote: > Hello Team, > I have a pecu

SPARK ON KUBERNETS IS SLOW

2025-04-11 Thread Prem Sahoo
Hello Team, I have a peculiar case of Spark slowness. I am using Minio as Object storage from where Spark reads & writes data. I am using YARN as Master and executing a Spark job which takes ~5mins the same job when run with Kubernetes as Master it takes ~8 mins . I checked the Spark DAG in both a