spark.sql.shuffle.partitions is still used I believe. I can see it in the code <https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L191> and in the documentation page <https://spark.apache.org/docs/latest/sql-programming-guide.html#other-configuration-options> .
On Wed, Sep 13, 2017 at 4:46 AM, peay <p...@protonmail.com> wrote: > Hello, > > I am running unit tests with Spark DataFrames, and I am looking for > configuration tweaks that would make tests faster. Usually, I use a > local[2] or local[4] master. > > Something that has been bothering me is that most of my stages end up > using 200 partitions, independently of whether I repartition the input. > This seems a bit overkill for small unit tests that barely have 200 rows > per DataFrame. > > spark.sql.shuffle.partitions used to control this I believe, but it seems > to be gone and I could not find any information on what mechanism/setting > replaces it or the corresponding JIRA. > > Has anyone experience to share on how to tune Spark best for very small > local runs like that? > > Thanks! > > -- Cheers!