subject:"Re\: What are 'Buckets' referred in Spark Core code"

Re: What are 'Buckets' referred in Spark Core code

2015-08-02 Thread cheez

Do we have a data structure that corresponds to buckets in Shuffle ? That is of we wanted to explore the 'content' of these buckets in shuffle phase, can we do that ? If yes, how ? -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/What-are-Buckets-referr

Re: What are 'Buckets' referred in Spark Core code

2015-08-02 Thread Reynold Xin

There are two usage of buckets used in Spark core. The first usage is in histogram, used to perform sorting. Basically we build an approximate histogram of the data in order to decide how to partition the data in sorting. Each bucket is a range in the histogram. The 2nd is used in shuffle, where