AFAIK there is no way to set it at each Exchange.
If you are using Spark 3.2+ AQE Performance Tuning - Spark 3.3.0 Documentation 
(apache.org)<https://spark.apache.org/docs/latest/sql-performance-tuning.html#adaptive-query-execution>
 is enabled by default, which can automatically changes post shuffle partition 
number on each exchange which can help your use case. This feature is available 
from Spark 3.

From: Anupam Singh <[email protected]>
Sent: Saturday, September 10, 2022 10:23 PM
To: Vibhor Gupta <[email protected]>
Cc: [email protected]
Subject: [EXTERNAL] Re: Dynamic shuffle partitions in a single job

You don't often get email from 
[email protected]<mailto:[email protected]>. Learn why this is 
important<https://aka.ms/LearnAboutSenderIdentification>
Commenting for better reach :)

On Thu, Sep 8, 2022, 11:56 AM Vibhor Gupta 
<[email protected]<mailto:[email protected]>> 
wrote:
Hi Community,

Is it possible to set no of shuffle partitions per exchange ?

My spark query contains a lot of joins/aggregations involving big tables and 
small tables, so keeping a high value of spark.sql.shuffle.partitions helps 
with big tables, for small tables it creates a lot of overhead on the scheduler.

Cheers,
Vibhor

Reply via email to