What if I don't have to use aggregate function only groupbykeylocally() and then a map transformation?
Will reduceByKeyLocally help here? Or is there any workaround if groupbykey is not locally and is global across all partitions. Thanks On Tue, Dec 1, 2015 at 5:20 PM, ayan guha <guha.a...@gmail.com> wrote: > I believe reduceByKeyLocally was introduced for this purpose. > > On Tue, Dec 1, 2015 at 10:21 PM, Jacek Laskowski <ja...@japila.pl> wrote: > >> Hi Rajat, >> >> My quick test has showed that groupBy will preserve the partitions: >> >> scala> >> sc.parallelize(Seq(0,0,0,0,1,1,1,1),2).map((_,1)).mapPartitionsWithIndex >> { case (idx, iter) => val s = iter.toSeq; println(idx + " with " + >> s.size + " elements: " + s); s.toIterator >> }.groupBy(_._1).mapPartitionsWithIndex { case (idx, iter) => val s = >> iter.toSeq; println(idx + " with " + s.size + " elements: " + s); >> s.toIterator }.collect >> >> 1 with 4 elements: Stream((1,1), (1,1), (1,1), (1,1)) >> 0 with 4 elements: Stream((0,1), (0,1), (0,1), (0,1)) >> >> 0 with 1 elements: Stream((0,CompactBuffer((0,1), (0,1), (0,1), (0,1)))) >> 1 with 1 elements: Stream((1,CompactBuffer((1,1), (1,1), (1,1), (1,1)))) >> >> Do I miss anything? >> >> Pozdrawiam, >> Jacek >> >> -- >> Jacek Laskowski | https://medium.com/@jaceklaskowski/ | >> http://blog.jaceklaskowski.pl >> Mastering Spark >> https://jaceklaskowski.gitbooks.io/mastering-apache-spark/ >> Follow me at https://twitter.com/jaceklaskowski >> Upvote at http://stackoverflow.com/users/1305344/jacek-laskowski >> >> >> On Tue, Dec 1, 2015 at 2:46 AM, Rajat Kumar <rajatkumar10...@gmail.com> >> wrote: >> > Hi >> > >> > i have a javaPairRdd<K,V> rdd1. i want to group by rdd1 by keys but >> preserve >> > the partitions of original rdd only to avoid shuffle since I know all >> same >> > keys are already in same partition. >> > >> > PairRdd is basically constrcuted using kafka streaming low level >> consumer >> > which have all records with same key already in same partition. Can i >> group >> > them together with avoid shuffle. >> > >> > Thanks >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> > > > -- > Best Regards, > Ayan Guha >