Switch from Sort based to Hash based shuffle

2015-08-13 Thread cheez
I understand that the current master branch of Spark uses Sort based shuffle. Is there a way to change that to Hash based shuffle, just for experimental purposes by modifying the source code ? -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Switch-from

Bucket mappings of map stage output

2015-08-06 Thread cheez
Hey all. I was trying to understand Spark Internals by looking in to (and hacking) the code. I was basically trying to explore the buckets which are generated when we partition the output of each map task and then let the reduce side fetch them on the basis of paritionId. I went into the write() m

Re: What are 'Buckets' referred in Spark Core code

2015-08-02 Thread cheez
Do we have a data structure that corresponds to buckets in Shuffle ? That is of we wanted to explore the 'content' of these buckets in shuffle phase, can we do that ? If yes, how ? -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/What-are-Buckets-referr