Thanks Akhil, as you suggested, I have to go keyBy(route) as need the
columns intact.
But wil keyBy() take accept multiple fields (eg x(0), x(1))?

Thanks
Amit

On Tue, Jun 9, 2015 at 12:26 PM, Akhil Das <ak...@sigmoidanalytics.com>
wrote:

> Try this way:
>
> scala>val input1 = sc.textFile("/test7").map(line =>
> line.split(",").map(_.trim));
> scala>val input2 = sc.textFile("/test8").map(line =>
> line.split(",").map(_.trim));
> scala>val input11 = input1.map(x=>(*(x(0) + x(1)*),x(2),x(3)))
> scala>val input22 = input2.map(x=>(*(x(0) + x(1)*),x(2),x(3)))
>
> scala> input11.join(input22).take(10)
>
>
> PairFunctions basically requires RDD[K,V] and in your case its ((String,
> String), String, String). You can also look in keyBy if you don't want to
> concatenate your keys.
>
> Thanks
> Best Regards
>
> On Tue, Jun 9, 2015 at 10:14 AM, amit tewari <amittewar...@gmail.com>
> wrote:
>
>> Hi Dear Spark Users
>>
>> I am very new to Spark/Scala.
>>
>> Am using Datastax (4.7/Spark 1.2.1) and struggling with following
>> error/issue.
>>
>> Already tried options like import org.apache.spark.SparkContext._ or
>> explicit import org.apache.spark.SparkContext.rddToPairRDDFunctions.
>> But error not resolved.
>>
>> Help much appreciated.
>>
>> Thanks
>> AT
>>
>> scala>val input1 = sc.textFile("/test7").map(line =>
>> line.split(",").map(_.trim));
>> scala>val input2 = sc.textFile("/test8").map(line =>
>> line.split(",").map(_.trim));
>> scala>val input11 = input1.map(x=>((x(0),x(1)),x(2),x(3)))
>> scala>val input22 = input2.map(x=>((x(0),x(1)),x(2),x(3)))
>>
>>  scala> input11.join(input22).take(10)
>>
>> <console>:66: error: value join is not a member of
>> org.apache.spark.rdd.RDD[((String, String), String, String)]
>>
>>               input11.join(input22).take(10)
>>
>>
>>
>>
>>
>>
>>
>

Reply via email to