Thanks Akhil, as you suggested, I have to go keyBy(route) as need the columns intact. But wil keyBy() take accept multiple fields (eg x(0), x(1))?
Thanks Amit On Tue, Jun 9, 2015 at 12:26 PM, Akhil Das <ak...@sigmoidanalytics.com> wrote: > Try this way: > > scala>val input1 = sc.textFile("/test7").map(line => > line.split(",").map(_.trim)); > scala>val input2 = sc.textFile("/test8").map(line => > line.split(",").map(_.trim)); > scala>val input11 = input1.map(x=>(*(x(0) + x(1)*),x(2),x(3))) > scala>val input22 = input2.map(x=>(*(x(0) + x(1)*),x(2),x(3))) > > scala> input11.join(input22).take(10) > > > PairFunctions basically requires RDD[K,V] and in your case its ((String, > String), String, String). You can also look in keyBy if you don't want to > concatenate your keys. > > Thanks > Best Regards > > On Tue, Jun 9, 2015 at 10:14 AM, amit tewari <amittewar...@gmail.com> > wrote: > >> Hi Dear Spark Users >> >> I am very new to Spark/Scala. >> >> Am using Datastax (4.7/Spark 1.2.1) and struggling with following >> error/issue. >> >> Already tried options like import org.apache.spark.SparkContext._ or >> explicit import org.apache.spark.SparkContext.rddToPairRDDFunctions. >> But error not resolved. >> >> Help much appreciated. >> >> Thanks >> AT >> >> scala>val input1 = sc.textFile("/test7").map(line => >> line.split(",").map(_.trim)); >> scala>val input2 = sc.textFile("/test8").map(line => >> line.split(",").map(_.trim)); >> scala>val input11 = input1.map(x=>((x(0),x(1)),x(2),x(3))) >> scala>val input22 = input2.map(x=>((x(0),x(1)),x(2),x(3))) >> >> scala> input11.join(input22).take(10) >> >> <console>:66: error: value join is not a member of >> org.apache.spark.rdd.RDD[((String, String), String, String)] >> >> input11.join(input22).take(10) >> >> >> >> >> >> >> >