Re: How can I apply such an inner join in Spark Scala/Python

2014-11-17 Thread Akhil Das
Simple join would do it. val a: List[(Int, Int)] = List((1,2),(2,4),(3,6)) val b: List[(Int, Int)] = List((1,3),(2,5),(3,6), (4,5),(5,6)) val A = sparkContext.parallelize(a) val B = sparkContext.parallelize(b) val ac = new PairRDDFunctions[Int, Int](A) *val C = ac.join(B

Re: How can I apply such an inner join in Spark Scala/Python

2014-11-17 Thread Sean Owen
Just RDD.join() should be an inner join. On Mon, Nov 17, 2014 at 5:51 PM, Blind Faith wrote: > So let us say I have RDDs A and B with the following values. > > A = [ (1, 2), (2, 4), (3, 6) ] > > B = [ (1, 3), (2, 5), (3, 6), (4, 5), (5, 6) ] > > I want to apply an inner join, such that I get the

How can I apply such an inner join in Spark Scala/Python

2014-11-17 Thread Blind Faith
So let us say I have RDDs A and B with the following values. A = [ (1, 2), (2, 4), (3, 6) ] B = [ (1, 3), (2, 5), (3, 6), (4, 5), (5, 6) ] I want to apply an inner join, such that I get the following as a result. C = [ (1, (2, 3)), (2, (4, 5)), (3, (6,6)) ] That is, those keys which are not pr