There are two usage of buckets used in Spark core.

The first usage is in histogram, used to perform sorting. Basically we
build an approximate histogram of the data in order to decide how to
partition the data in sorting. Each bucket is a range in the histogram.

The 2nd is used in shuffle, where we partition the output of each map task
into different "buckets", letting the reduce side fetching the map side
data based on their partition id.


On Sun, Aug 2, 2015 at 1:55 PM, Haseeb <11besemja...@seecs.edu.pk> wrote:

> Hi all ,
> I am neebie trying to understand spark internals. There some entity
> referred
> to as 'buckets' at many places in Spark Core code but I am having a hard
> time what it is as it is just mentioned in code comments but I didn't come
> across any data structure that reffered to it or any class for that matter.
> I'd be really grateful if someone could shed some light on what exactly
> buckets are and what is their functionally with respect to Spark internals.
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/What-are-Buckets-referred-in-Spark-Core-code-tp13557.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>

Reply via email to