Dataframe Partitioning

Teng Liao Tue, 01 Mar 2016 15:20:02 -0800

Hi,

I was wondering what the rationale behind defaulting all repartitioning to 
spark.sql.shuffle.partitions is. I’m seeing a huge overhead when running a job 
whose input partitions is 2 and, using the default value for 
spark.sql.shuffle.partitions, this is now 200. Thanks.


-Teng Fei Liao

Dataframe Partitioning

Reply via email to