Re: aggregateByKey on PairRDD

Daniel Haviv Wed, 30 Mar 2016 03:59:33 -0700

Hi,
shouldn't groupByKey be avoided (
https://databricks.gitbooks.io/databricks-spark-knowledge-base/content/best_practices/prefer_reducebykey_over_groupbykey.html)
?



Thank you,.
Daniel

On Wed, Mar 30, 2016 at 9:01 AM, Akhil Das <ak...@sigmoidanalytics.com>
wrote:

> Isn't it what tempRDD.groupByKey does?
>
> Thanks
> Best Regards
>
> On Wed, Mar 30, 2016 at 7:36 AM, Suniti Singh <suniti.si...@gmail.com>
> wrote:
>
>> Hi All,
>>
>> I have an RDD having the data in  the following form :
>>
>> tempRDD: RDD[(String, (String, String))]
>>
>> (brand , (product, key))
>>
>> ("amazon",("book1","tech"))
>>
>> ("eBay",("book1","tech"))
>>
>> ("barns&noble",("book","tech"))
>>
>> ("amazon",("book2","tech"))
>>
>>
>> I would like to group the data by Brand and would like to get the result
>> set in the following format :
>>
>> resultSetRDD : RDD[(String, List[(String), (String)]
>>
>> i tried using the aggregateByKey but kind  of not getting how to achieve
>> this. OR is there any other way to achieve this?
>>
>> val resultSetRDD  = tempRDD.aggregateByKey("")({case (aggr , value) =>
>> aggr + String.valueOf(value) + ","}, (aggr1, aggr2) => aggr1 + aggr2)
>>
>> resultSetRDD = (amazon,("book1","tech"),("book2","tech"))
>>
>> Thanks,
>>
>> Suniti
>>
>
>

Re: aggregateByKey on PairRDD

Reply via email to