Do we have a data structure that corresponds to buckets in Shuffle ? That is
of we wanted to explore the 'content' of these buckets in shuffle phase, can
we do that ? If yes, how ?
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/What-are-Buckets-referr
There are two usage of buckets used in Spark core.
The first usage is in histogram, used to perform sorting. Basically we
build an approximate histogram of the data in order to decide how to
partition the data in sorting. Each bucket is a range in the histogram.
The 2nd is used in shuffle, where