This https://github.com/apache/spark/pull/1799 seems the first PR to introduce this number. But there is no explanation about the number.
Jacek Laskowski wrote > Hi, > > I'm wondering what's so special about 200 to have it the default value > of spark.shuffle.sort.bypassMergeThreshold? > > Is this arbitrary number? Is there any theory behind it? > > Is the number of partitions in Spark SQL, i.e. 200, somehow related to > spark.shuffle.sort.bypassMergeThreshold? > > scala> spark.range(5).groupByKey(_ % 5).count.rdd.getNumPartitions > res3: Int = 200 > > I'd appreciate any guidance to get the gist of this seemingly magic > number. Thanks! > > Pozdrawiam, > Jacek Laskowski > ---- > https://medium.com/@jaceklaskowski/ > Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark > Follow me at https://twitter.com/jaceklaskowski > > --------------------------------------------------------------------- > To unsubscribe e-mail: > dev-unsubscribe@.apache ----- Liang-Chi Hsieh | @viirya Spark Technology Center http://www.spark.tc/ -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Why-is-spark-shuffle-sort-bypassMergeThreshold-200-tp20379p20389.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org