I understand that the current master branch of Spark uses Sort based shuffle.
Is there a way to change that to Hash based shuffle, just for experimental
purposes by modifying the source code ?
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/Switch-from
Hey all.
I was trying to understand Spark Internals by looking in to (and hacking)
the code. I was basically trying to explore the buckets which are generated
when we partition the output of each map task and then let the reduce side
fetch them on the basis of paritionId. I went into the write() m
Do we have a data structure that corresponds to buckets in Shuffle ? That is
of we wanted to explore the 'content' of these buckets in shuffle phase, can
we do that ? If yes, how ?
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/What-are-Buckets-referr