Thanks Karan for input but found the issue of slowness in kubernetes while doing broadcast it takes 2 to 3 times than YARN . need to check why such a big difference. In our case we are doing around 40 tables manual broadcast as the size of tables is more than 10 mb. What can be done in kubernetes so that this broadcast time will be at par with YARN.
Note : here we are using client mode in both YARN and kubernetes. Sent from my iPhone On Apr 11, 2025, at 2:07 PM, karan alang <karan.al...@gmail.com> wrote:
Pls check if there are resource constraints on the pods/nodes especially if they are shared. MinIO connectivity performance needs to be checked.
With YARN and External Spark Shuffle, the sparkshuffle is a lot more optimized, so we can experience slowness with spark on k8s, especially if there is a pod restart. Have you checked Apache Uniffle / Celeborn for enabling spark shuffle ?
fyi .. i'm using kubeflow spark operator, and in the process of doing performance comparison/optimization as well. regds, Karan Alang
Hello Karan, I am using Spark open source in kubernetes and Spark mapr bundle in YARN.
For launching job in both approach it takes same 10 secs .
For shuffle I am using local in both yarn and kubernetes. Sent from my iPhone Hi Prem,
Which distribution of Spark are you using ? how long does it take to launch the job ? wrt Spark Shuffle, what is the approach you are using - storing shuffle data in MinIO or using host path ?
regds, Karan Hello Team, I have a peculiar case of Spark slowness. I am using Minio as Object storage from where Spark reads & writes data. I am using YARN as Master and executing a Spark job which takes ~5mins the same job when run with Kubernetes as Master it takes ~8 mins .
I checked the Spark DAG in both and observed the same no of jobs/stages and tasks. I am using the same machines which are being used in YARN and Kubernetes .
one observation: when I have disabled Spark Dynamic allocation false and assigned static allocation I can see the execution time in Kubernetes based Spark job ~5.5 mins.
May I ask the team what could be the reason that Spark job runs slow on kubernetes and what can be done to make it faster ? Note :- I am using Spark 3.2 in both.
|