Just RDD.join() should be an inner join. On Mon, Nov 17, 2014 at 5:51 PM, Blind Faith <person.of.b...@gmail.com> wrote: > So let us say I have RDDs A and B with the following values. > > A = [ (1, 2), (2, 4), (3, 6) ] > > B = [ (1, 3), (2, 5), (3, 6), (4, 5), (5, 6) ] > > I want to apply an inner join, such that I get the following as a result. > > C = [ (1, (2, 3)), (2, (4, 5)), (3, (6,6)) ] > > That is, those keys which are not present in A should disappear after the > left inner join. > > How can I achieve that? I can see outerJoin functions but no innerJoin > functions in the Spark RDD class.
--------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org