Re: Why is spark.shuffle.sort.bypassMergeThreshold 200?

Liang-Chi Hsieh Wed, 28 Dec 2016 19:05:35 -0800

This https://github.com/apache/spark/pull/1799 seems the first PR to
introduce this number. But there is no explanation about the number.



Jacek Laskowski wrote
> Hi,
> 
> I'm wondering what's so special about 200 to have it the default value
> of spark.shuffle.sort.bypassMergeThreshold?
> 
> Is this arbitrary number? Is there any theory behind it?
> 
> Is the number of partitions in Spark SQL, i.e. 200, somehow related to
> spark.shuffle.sort.bypassMergeThreshold?
> 
> scala> spark.range(5).groupByKey(_ % 5).count.rdd.getNumPartitions
> res3: Int = 200
> 
> I'd appreciate any guidance to get the gist of this seemingly magic
> number. Thanks!
> 
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
> 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: 

> dev-unsubscribe@.apache





-----
Liang-Chi Hsieh | @viirya 
Spark Technology Center 
http://www.spark.tc/ 
--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Why-is-spark-shuffle-sort-bypassMergeThreshold-200-tp20379p20389.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Why is spark.shuffle.sort.bypassMergeThreshold 200?

Reply via email to