Hi,

Probably this question is already answered sometime in the mailing list,
but i couldn't find it. Sorry for posting this again.

I need to to join and apply filtering on three different RDDs, I just
wonder which of the following alternatives are more efficient:
1- first joint all three RDDs and then do  filtering on resulting joint RDD
  or
2- Apply filtering on each individual RDD and then join the resulting RDDs


Or probably there is no difference due to lazy evaluation and under beneath
Spark optimisation?

best,
/Shahab

Reply via email to