Hi, I'm wondering what's so special about 200 to have it the default value of spark.shuffle.sort.bypassMergeThreshold?
Is this arbitrary number? Is there any theory behind it? Is the number of partitions in Spark SQL, i.e. 200, somehow related to spark.shuffle.sort.bypassMergeThreshold? scala> spark.range(5).groupByKey(_ % 5).count.rdd.getNumPartitions res3: Int = 200 I'd appreciate any guidance to get the gist of this seemingly magic number. Thanks! Pozdrawiam, Jacek Laskowski ---- https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org