Hi, We can use CombineByKey to achieve this. val finalRDD = tempRDD.combineByKey((x: (Any, Any)) => (x),(acc: (Any, Any), x) => (acc, x),(acc1: (Any, Any), acc2: (Any, Any)) => (acc1, acc2)) finalRDD.collect.foreach(println) (amazon,((book1, tech),(book2,tech)))(barns&noble, (book,tech))(eBay, (book1,tech)) Thanks,Sivakumar
-------- Original message -------- From: Daniel Haviv <daniel.ha...@veracity-group.com> Date: 30/03/2016 18:58 (GMT+08:00) To: Akhil Das <ak...@sigmoidanalytics.com> Cc: Suniti Singh <suniti.si...@gmail.com>, user@spark.apache.org, dev <d...@spark.apache.org> Subject: Re: aggregateByKey on PairRDD Hi,shouldn't groupByKey be avoided (https://databricks.gitbooks.io/databricks-spark-knowledge-base/content/best_practices/prefer_reducebykey_over_groupbykey.html) ? Thank you,.Daniel On Wed, Mar 30, 2016 at 9:01 AM, Akhil Das <ak...@sigmoidanalytics.com> wrote: Isn't it what tempRDD.groupByKey does? ThanksBest Regards On Wed, Mar 30, 2016 at 7:36 AM, Suniti Singh <suniti.si...@gmail.com> wrote: Hi All, I have an RDD having the data in the following form : tempRDD: RDD[(String, (String, String))](brand , (product, key))("amazon",("book1","tech"))("eBay",("book1","tech")) ("barns&noble",("book","tech")) ("amazon",("book2","tech")) I would like to group the data by Brand and would like to get the result set in the following format :resultSetRDD : RDD[(String, List[(String), (String)]i tried using the aggregateByKey but kind of not getting how to achieve this. OR is there any other way to achieve this? val resultSetRDD = tempRDD.aggregateByKey("")({case (aggr , value) => aggr + String.valueOf(value) + ","}, (aggr1, aggr2) => aggr1 + aggr2)resultSetRDD = (amazon,("book1","tech"),("book2","tech"))Thanks,Suniti