Hi, shouldn't groupByKey be avoided ( https://databricks.gitbooks.io/databricks-spark-knowledge-base/content/best_practices/prefer_reducebykey_over_groupbykey.html) ?
Thank you,. Daniel On Wed, Mar 30, 2016 at 9:01 AM, Akhil Das <ak...@sigmoidanalytics.com> wrote: > Isn't it what tempRDD.groupByKey does? > > Thanks > Best Regards > > On Wed, Mar 30, 2016 at 7:36 AM, Suniti Singh <suniti.si...@gmail.com> > wrote: > >> Hi All, >> >> I have an RDD having the data in the following form : >> >> tempRDD: RDD[(String, (String, String))] >> >> (brand , (product, key)) >> >> ("amazon",("book1","tech")) >> >> ("eBay",("book1","tech")) >> >> ("barns&noble",("book","tech")) >> >> ("amazon",("book2","tech")) >> >> >> I would like to group the data by Brand and would like to get the result >> set in the following format : >> >> resultSetRDD : RDD[(String, List[(String), (String)] >> >> i tried using the aggregateByKey but kind of not getting how to achieve >> this. OR is there any other way to achieve this? >> >> val resultSetRDD = tempRDD.aggregateByKey("")({case (aggr , value) => >> aggr + String.valueOf(value) + ","}, (aggr1, aggr2) => aggr1 + aggr2) >> >> resultSetRDD = (amazon,("book1","tech"),("book2","tech")) >> >> Thanks, >> >> Suniti >> > >