Pls check if there are resource constraints on the pods/nodes especially if
they are shared.
MinIO connectivity performance needs to be checked.

With YARN and External Spark Shuffle, the sparkshuffle is a lot more
optimized, so we can experience slowness with spark on k8s, especially if
there is a pod restart.

Have you checked Apache Uniffle / Celeborn for enabling spark shuffle ?

fyi .. i'm using kubeflow spark operator, and in the process of doing
performance comparison/optimization as well.

regds,
Karan Alang


On Fri, Apr 11, 2025 at 9:07 AM Prem Sahoo <prem.re...@gmail.com> wrote:

> Hello Karan,
> I am using Spark open source in kubernetes and Spark mapr bundle in YARN.
>
> For launching job in both approach it takes same 10 secs .
>
> For shuffle I am using local in both yarn and kubernetes.
> Sent from my iPhone
>
> On Apr 11, 2025, at 11:24 AM, karan alang <karan.al...@gmail.com> wrote:
>
> 
> Hi Prem,
>
> Which distribution of Spark are you using ?
> how long does it take to launch the job ?
> wrt Spark Shuffle, what is the approach you are using - storing shuffle
> data in MinIO or using host path ?
>
> regds,
> Karan
>
> On Fri, Apr 11, 2025 at 4:58 AM Prem Sahoo <prem.re...@gmail.com> wrote:
>
>> Hello Team,
>> I have a peculiar case of Spark slowness.
>> I am using Minio as Object storage from where Spark reads & writes data.
>> I am using YARN as Master and executing a Spark job which takes ~5mins the
>> same job when run with Kubernetes as Master it takes ~8 mins .
>>
>> I checked the Spark DAG in both and observed the same no of jobs/stages
>> and tasks. I am using the same machines which are being used in YARN and
>> Kubernetes .
>>
>> one observation: when I have disabled Spark Dynamic allocation false and
>> assigned static allocation I can see the execution time in Kubernetes based
>> Spark job ~5.5 mins.
>>
>> May I ask the team what could be the reason that Spark job runs slow on
>> kubernetes and what can be done to make it faster ?
>> Note :- I am using Spark 3.2 in both.
>>
>>

Reply via email to