Thanks for the response!! Will try to see the behaviour with Cache()
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Task-size-variation-while-using-Range-Vs-List-tp18243p18318.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
--
For 2), If the input is Range, Spark only needs the start value and the end
value for each partition, so the overhead of Range is little. But
for ArrayBuffer, Spark needs to serialize all of the data into the task.
That's why it's huge in your case.
For 1), Spark does not always travel the data to