Hi, I have myself used union in a similar case. And applied reduceByKey on it. Union + reduceByKey will suffice join... but you will have to first use Map so that all values are of same datatype....
Regards, Sushrut Ikhar [image: https://]about.me/sushrutikhar <https://about.me/sushrutikhar?promo=email_sig> On Tue, Dec 1, 2015 at 3:34 PM, Sonal Goyal <sonalgoy...@gmail.com> wrote: > I think you should be able to join different rdds with same key. Have you > tried that? > On Dec 1, 2015 3:30 PM, "Praveen Chundi" <mail.chu...@gmail.com> wrote: > >> cogroup could be useful to you, since all three are PairRDD's. >> >> >> https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.PairRDDFunctions >> >> Best Regards, >> Praveen >> >> >> On 01.12.2015 10:47, Shams ul Haque wrote: >> >>> Hi All, >>> >>> I have made 3 RDDs of 3 different dataset, all RDDs are grouped by >>> CustomerID in which 2 RDDs have value of Iterable type and one has signle >>> bean. All RDDs have id of Long type as CustomerId. Below are the model for >>> 3 RDDs: >>> JavaPairRDD<Long, Iterable<TransactionInfo>> >>> JavaPairRDD<Long, Iterable<TransactionRaw>> >>> JavaPairRDD<Long, TransactionAggr> >>> >>> Now, i have to merge all these 3 RDDs as signle one so that i can >>> generate excel report as per each customer by using data in 3 RDDs. >>> As i tried to using Join Transformation but it needs RDDs of same type >>> and it works for two RDDs. >>> So my questions is, >>> 1. is there any way to done my merging task efficiently, so that i can >>> get all 3 dataset by CustomerId? >>> 2. If i merge 1st two using Join Transformation, then do i need to run >>> groupByKey() before Join so that all data related to single customer will >>> be on one node? >>> >>> >>> Thanks >>> Shams >>> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >>