Why is spark.shuffle.sort.bypassMergeThreshold 200?

Jacek Laskowski Wed, 28 Dec 2016 01:22:30 -0800

Hi,

I'm wondering what's so special about 200 to have it the default value
of spark.shuffle.sort.bypassMergeThreshold?


Is this arbitrary number? Is there any theory behind it?

Is the number of partitions in Spark SQL, i.e. 200, somehow related to
spark.shuffle.sort.bypassMergeThreshold?

scala> spark.range(5).groupByKey(_ % 5).count.rdd.getNumPartitions
res3: Int = 200

I'd appreciate any guidance to get the gist of this seemingly magic
number. Thanks!

Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Why is spark.shuffle.sort.bypassMergeThreshold 200?

Reply via email to