What are 'Buckets' referred in Spark Core code

2015-08-02 Thread Haseeb
Hi all , I am neebie trying to understand spark internals. There some entity referred to as 'buckets' at many places in Spark Core code but I am having a hard time what it is as it is just mentioned in code comments but I didn't come across any data structure that reffered to it or any class for th

Re: Switch from Sort based to Hash based shuffle

2015-08-15 Thread Muhammad Haseeb Javed
Thanks guys, that did it. On Thu, Aug 13, 2015 at 6:49 PM, Akhil Das wrote: > Have a look at spark.shuffle.manager, You can switch between sort and hash > with this configuration. > > spark.shuffle.managersortImplementation to use for shuffling data. There > are two implementations available:sor

Communication between executors and drivers

2015-09-16 Thread Muhammad Haseeb Javed
How do executors communicate with the driver in Spark ? I understand that it s done using Akka actors and messages are exchanged as CoarseGrainedSchedulerMessage, but I'd really appreciate if someone could explain the entire process in a bit detail.

Wrap an RDD with a ShuffledRDD

2015-11-08 Thread Muhammad Haseeb Javed
I am working on a modified Spark core and have a Broadcast variable which I deserialize to obtain an RDD along with its set of dependencies, as is done in ShuffleMapTask, as following: val taskBinary: Broadcast[Array[Byte]]var (rdd, dep) = ser.deserialize[(RDD[_], ShuffleDependency[_, _, _])](