No, only each group should need to fit. On Wed, Feb 11, 2015 at 2:56 PM, Corey Nolet <cjno...@gmail.com> wrote:
> Doesn't iter still need to fit entirely into memory? > > On Wed, Feb 11, 2015 at 5:55 PM, Mark Hamstra <m...@clearstorydata.com> > wrote: > >> rdd.mapPartitions { iter => >> val grouped = iter.grouped(batchSize) >> for (group <- grouped) { ... } >> } >> >> On Wed, Feb 11, 2015 at 2:44 PM, Corey Nolet <cjno...@gmail.com> wrote: >> >>> I think the word "partition" here is a tad different than the term >>> "partition" that we use in Spark. Basically, I want something similar to >>> Guava's Iterables.partition [1], that is, If I have an RDD[People] and I >>> want to run an algorithm that can be optimized by working on 30 people at a >>> time, I'd like to be able to say: >>> >>> val rdd: RDD[People] = ..... >>> val partitioned: RDD[Seq[People]] = rdd.partition(30).... >>> >>> I also don't want any shuffling- everything can still be processed >>> locally. >>> >>> >>> [1] >>> http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/collect/Iterables.html#partition(java.lang.Iterable,%20int) >>> >> >> >