Thanks Davies.. I'll try it when it gets released (I am on 1.1.0 currently). For now I am using a custom partitioner with the ShuffleRDD() to keep the same groups together, so I don't have to shuffle all data to a single partition.
On Thu, Oct 9, 2014 at 2:34 PM, Davies Liu <dav...@databricks.com> wrote: > There is a new API called repartitionAndSortWithinPartitions() in > master, it may help in this case, > then you should do the `groupBy()` by yourself. > > On Wed, Oct 8, 2014 at 4:03 PM, chinchu <chinchu....@gmail.com> wrote: > > Sean, > > > > I am having a similar issue, but I have a lot of data for a group & I > cannot > > materialize the iterable into a List or Seq in memory. [I tried & it runs > > into OOM]. is there any other way to do this ? > > > > I also tried a secondary-sort, with the key having the "group::time", but > > the problem with that is the same group-name ends up in multiple > partitions > > & I am having to run sortByKey with one partition - sortByKey(true, 1) > which > > shuffles a lot of data.. > > > > Thanks, > > -C > > > > > > > > -- > > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/GroupBy-Key-and-then-sort-values-with-the-group-tp14455p15990.html > > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > > For additional commands, e-mail: user-h...@spark.apache.org > > >