I am sorry the last line in the code is file1Rdd.join(file2RddGrp.mapValues(names => names.toSet)).collect().foreach(println) so My Code=======val file1Rdd = sc.textFile("/Users/sansub01/mycode/data/songs/names.txt").map(x => (x.split(",")(0), x.split(",")(1)))val file2Rdd = sc.textFile("/Users/sansub01/mycode/data/songs/songs.txt").map(x => (x.split(",")(0), x.split(",")(1)))val file2RddGrp = file2Rdd.groupByKey()file1Rdd.join(file2RddGrp.mapValues(names => names.toSet)).collect().foreach(println) Result=======(4,(ringo,Set(With a Little Help From My Friends, Octopus's Garden)))(2,(john,Set(Julia, Nowhere Man)))(3,(george,Set(While My Guitar Gently Weeps, Norwegian Wood)))(1,(paul,Set(Yesterday, Michelle))) Again the question is how do I extract values from the Set ? thanks sanjay From: Sanjay Subramanian <sanjaysubraman...@yahoo.com.INVALID> To: Arun Ahuja <aahuj...@gmail.com>; Andrew Ash <and...@andrewash.com> Cc: user <user@spark.apache.org> Sent: Friday, November 21, 2014 10:41 AM Subject: Extracting values from a Collecion hey guys names.txt========= 1,paul2,john3,george4,ringo
songs.txt========= 1,Yesterday2,Julia3,While My Guitar Gently Weeps4,With a Little Help From My Friends1,Michelle2,Nowhere Man3,Norwegian Wood4,Octopus's Garden What I want to do is real simple Desired Output ==============(4,(With a Little Help From My Friends, Octopus's Garden))(2,(Julia, Nowhere Man))(3,(While My Guitar Gently Weeps, Norwegian Wood))(1,(Yesterday, Michelle)) My Code=======val file1Rdd = sc.textFile("/Users/sansub01/mycode/data/songs/names.txt").map(x => (x.split(",")(0), x.split(",")(1)))val file2Rdd = sc.textFile("/Users/sansub01/mycode/data/songs/songs.txt").map(x => (x.split(",")(0), x.split(",")(1)))val file2RddGrp = file2Rdd.groupByKey()file2Rdd.groupByKey().mapValues(names => names.toSet).collect().foreach(println) Result=======(4,Set(With a Little Help From My Friends, Octopus's Garden))(2,Set(Julia, Nowhere Man))(3,Set(While My Guitar Gently Weeps, Norwegian Wood))(1,Set(Yesterday, Michelle)) How can I extract values from the Set ? Thanks sanjay