Hi, Probably this question is already answered sometime in the mailing list, but i couldn't find it. Sorry for posting this again.
I need to to join and apply filtering on three different RDDs, I just wonder which of the following alternatives are more efficient: 1- first joint all three RDDs and then do filtering on resulting joint RDD or 2- Apply filtering on each individual RDD and then join the resulting RDDs Or probably there is no difference due to lazy evaluation and under beneath Spark optimisation? best, /Shahab
