This is my design. Now let me try and code it in Spark. rdd1.txt =========1~4,5,6,72~4,53~6,7 rdd2.txt ========4~1001,1000,1002,10035~1004,1001,1006,10076~1007,1009,1005,10087~1011,1012,1013,1010 TRANSFORM 1===========map each value to key (like an inverted index)4~15~16~17~15~24~26~37~3 TRANSFORM 2===========Join keys in transform 1 and rdd24~1,1001,1000,1002,10034~2,1001,1000,1002,10035~1,1004,1001,1006,10075~2,1004,1001,1006,10076~1,1007,1009,1005,10086~3,1007,1009,1005,10087~1,1011,1012,1013,10107~3,1011,1012,1013,1010 TRANSFORM 3===========Split key in transform 2 with "~" and keep key(1) i.e. 1,2,31~1001,1000,1002,10032~1001,1000,1002,10031~1004,1001,1006,10072~1004,1001,1006,10071~1007,1009,1005,10083~1007,1009,1005,10081~1011,1012,1013,10103~1011,1012,1013,1010 TRANSFORM 4===========join by key 1~1001,1000,1002,1003,1004,1001,1006,1007,1007,1009,1005,1008,1011,1012,1013,10102~1001,1000,1002,1003,1004,1001,1006,10073~1007,1009,1005,1008,1011,1012,1013,1010
From: dcmovva <dilip.mo...@gmail.com> To: user@spark.apache.org Sent: Saturday, January 3, 2015 10:10 AM Subject: Joining by values I have a two pair RDDs in spark like this rdd1 = (1 -> [4,5,6,7]) (2 -> [4,5]) (3 -> [6,7]) rdd2 = (4 -> [1001,1000,1002,1003]) (5 -> [1004,1001,1006,1007]) (6 -> [1007,1009,1005,1008]) (7 -> [1011,1012,1013,1010]) I would like to combine them to look like this. joinedRdd = (1 -> [1000,1001,1002,1003,1004,1005,1006,1007,1008,1009,1010,1011,1012,1013]) (2 -> [1000,1001,1002,1003,1004,1006,1007]) (3 -> [1005,1007,1008,1009,1010,1011,1012,1013]) Can someone suggest me how to do this. Thanks Dilip -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Joining-by-values-tp20954.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org