AFAIK there is no way to set it at each Exchange. If you are using Spark 3.2+ AQE Performance Tuning - Spark 3.3.0 Documentation (apache.org)<https://spark.apache.org/docs/latest/sql-performance-tuning.html#adaptive-query-execution> is enabled by default, which can automatically changes post shuffle partition number on each exchange which can help your use case. This feature is available from Spark 3.
From: Anupam Singh <[email protected]> Sent: Saturday, September 10, 2022 10:23 PM To: Vibhor Gupta <[email protected]> Cc: [email protected] Subject: [EXTERNAL] Re: Dynamic shuffle partitions in a single job You don't often get email from [email protected]<mailto:[email protected]>. Learn why this is important<https://aka.ms/LearnAboutSenderIdentification> Commenting for better reach :) On Thu, Sep 8, 2022, 11:56 AM Vibhor Gupta <[email protected]<mailto:[email protected]>> wrote: Hi Community, Is it possible to set no of shuffle partitions per exchange ? My spark query contains a lot of joins/aggregations involving big tables and small tables, so keeping a high value of spark.sql.shuffle.partitions helps with big tables, for small tables it creates a lot of overhead on the scheduler. Cheers, Vibhor
