Re: Sorting partitions in Java

2014-05-20 Thread Madhu
Sean, No, I don't want to sort the whole RDD, sortByKey seems to be good enough for that. Right now, I think the code I have will work for me, but I can imagine conditions where it will run out of memory. I'm not completely sure if SPARK-983

Re: Sorting partitions in Java

2014-05-20 Thread Sean Owen
On Tue, May 20, 2014 at 6:10 PM, Madhu wrote: > What you suggest looks an in-memory sort, which is fine if each partition is > small enough to fit in memory. Is it true that rdd.sortByKey(...) requires > partitions to fit in memory? I wasn't sure if there was some magic behind > the scenes that su

Re: Sorting partitions in Java

2014-05-20 Thread Andrew Ash
Voted :) https://issues.apache.org/jira/browse/SPARK-983 On Tue, May 20, 2014 at 10:21 AM, Sandy Ryza wrote: > There is: SPARK-545 > > > On Tue, May 20, 2014 at 10:16 AM, Andrew Ash wrote: > > > Sandy, is there a Jira ticket for that? > > > > > > On Tue, May 20, 2014 at 10:12 AM, Sandy Ryza >

Re: Sorting partitions in Java

2014-05-20 Thread Sandy Ryza
There is: SPARK-545 On Tue, May 20, 2014 at 10:16 AM, Andrew Ash wrote: > Sandy, is there a Jira ticket for that? > > > On Tue, May 20, 2014 at 10:12 AM, Sandy Ryza >wrote: > > > sortByKey currently requires partitions to fit in memory, but there are > > plans to add external sort > > > > > >

Re: Sorting partitions in Java

2014-05-20 Thread Andrew Ash
Sandy, is there a Jira ticket for that? On Tue, May 20, 2014 at 10:12 AM, Sandy Ryza wrote: > sortByKey currently requires partitions to fit in memory, but there are > plans to add external sort > > > On Tue, May 20, 2014 at 10:10 AM, Madhu wrote: > > > Thanks Sean, I had seen that post you men

Re: Sorting partitions in Java

2014-05-20 Thread Sandy Ryza
sortByKey currently requires partitions to fit in memory, but there are plans to add external sort On Tue, May 20, 2014 at 10:10 AM, Madhu wrote: > Thanks Sean, I had seen that post you mentioned. > > What you suggest looks an in-memory sort, which is fine if each partition > is > small enough

Re: Sorting partitions in Java

2014-05-20 Thread Madhu
Thanks Sean, I had seen that post you mentioned. What you suggest looks an in-memory sort, which is fine if each partition is small enough to fit in memory. Is it true that rdd.sortByKey(...) requires partitions to fit in memory? I wasn't sure if there was some magic behind the scenes that support

Re: Sorting partitions in Java

2014-05-20 Thread Sean Owen
It's an Iterator in both Java and Scala. In both cases you need to copy the stream of values into something List-like to sort it. An Iterable would not change that (not sure the API can promise many iterations anyway). If you just want the equivalent of "toArray", you can use a utility method in C