sortByKey currently requires partitions to fit in memory, but there are
plans to add external sort


On Tue, May 20, 2014 at 10:10 AM, Madhu <ma...@madhu.com> wrote:

> Thanks Sean, I had seen that post you mentioned.
>
> What you suggest looks an in-memory sort, which is fine if each partition
> is
> small enough to fit in memory. Is it true that rdd.sortByKey(...) requires
> partitions to fit in memory? I wasn't sure if there was some magic behind
> the scenes that supports arbitrarily large sorts.
>
> None of this is a show stopper, it just might require a little more code on
> the part of the developer. If there's a requirement for Spark partitions to
> fit in memory, developers will have to be aware of that and plan
> accordingly. One nice feature of Hadoop MR is the ability to sort very
> large
> sets without thinking about data size.
>
> In the case that a developer repartitions an RDD such that some partitions
> don't fit in memory, sorting those partitions requires more work. For these
> cases, I think there is value in having a robust partition sorting method
> that deals with it efficiently and reliably.
>
> Is there another solution for sorting arbitrarily large partitions? If not,
> I don't mind developing and contributing a solution.
>
>
>
>
> -----
> --
> Madhu
> https://www.linkedin.com/in/msiddalingaiah
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Sorting-partitions-in-Java-tp6715p6719.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>

Reply via email to