Hi,

I'm wondering what's so special about 200 to have it the default value
of spark.shuffle.sort.bypassMergeThreshold?

Is this arbitrary number? Is there any theory behind it?

Is the number of partitions in Spark SQL, i.e. 200, somehow related to
spark.shuffle.sort.bypassMergeThreshold?

scala> spark.range(5).groupByKey(_ % 5).count.rdd.getNumPartitions
res3: Int = 200

I'd appreciate any guidance to get the gist of this seemingly magic
number. Thanks!

Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to