Re: merge 3 different types of RDDs in one

Sushrut Ikhar Tue, 01 Dec 2015 02:21:13 -0800

Hi,
I have myself used union in a similar case. And applied reduceByKey on it.
Union + reduceByKey will suffice join... but you will have to first use Map
so that all values are of same datatype....


Regards,

Sushrut Ikhar
[image: https://]about.me/sushrutikhar
<https://about.me/sushrutikhar?promo=email_sig>


On Tue, Dec 1, 2015 at 3:34 PM, Sonal Goyal <sonalgoy...@gmail.com> wrote:

> I think you should be able to join different  rdds with same key. Have you
> tried that?
> On Dec 1, 2015 3:30 PM, "Praveen Chundi" <mail.chu...@gmail.com> wrote:
>
>> cogroup could be useful to you, since all three are PairRDD's.
>>
>>
>> https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.PairRDDFunctions
>>
>> Best Regards,
>> Praveen
>>
>>
>> On 01.12.2015 10:47, Shams ul Haque wrote:
>>
>>> Hi All,
>>>
>>> I have made 3 RDDs of 3 different dataset, all RDDs are grouped by
>>> CustomerID in which 2 RDDs have value of Iterable type and one has signle
>>> bean. All RDDs have id of Long type as CustomerId. Below are the model for
>>> 3 RDDs:
>>> JavaPairRDD<Long, Iterable<TransactionInfo>>
>>> JavaPairRDD<Long, Iterable<TransactionRaw>>
>>> JavaPairRDD<Long, TransactionAggr>
>>>
>>> Now, i have to merge all these 3 RDDs as signle one so that i can
>>> generate excel report as per each customer by using data in 3 RDDs.
>>> As i tried to using Join Transformation but it needs RDDs of same type
>>> and it works for two RDDs.
>>> So my questions is,
>>> 1. is there any way to done my merging task efficiently, so that i can
>>> get all 3 dataset by CustomerId?
>>> 2. If i merge 1st two using Join Transformation, then do i need to run
>>> groupByKey() before Join so that all data related to single customer will
>>> be on one node?
>>>
>>>
>>> Thanks
>>> Shams
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>

Re: merge 3 different types of RDDs in one

Reply via email to