Re: Task size variation while using Range Vs List

2014-11-06 Thread nsareen
Thanks for the response!! Will try to see the behaviour with Cache() -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Task-size-variation-while-using-Range-Vs-List-tp18243p18318.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --

Re: Task size variation while using Range Vs List

2014-11-05 Thread Shixiong Zhu
For 2), If the input is Range, Spark only needs the start value and the end value for each partition, so the overhead of Range is little. But for ArrayBuffer, Spark needs to serialize all of the data into the task. That's why it's huge in your case. For 1), Spark does not always travel the data to