On Tue, May 20, 2014 at 6:10 PM, Madhu <ma...@madhu.com> wrote:
> What you suggest looks an in-memory sort, which is fine if each partition is
> small enough to fit in memory. Is it true that rdd.sortByKey(...) requires
> partitions to fit in memory? I wasn't sure if there was some magic behind
> the scenes that supports arbitrarily large sorts.

Yes, but so did the Scala version you posted -- I assumed that was OK
for your use case. Regardless of what Spark does, you would copy all
values into memory with toArray.

sortByKey is something fairly different. It sorts the whole RDD by
key, not values within each key. I think Sandy is talking about
something related but not quite the same.

Do you really mean you want to sort the whole RDD?

Reply via email to