It is basically a Cartesian join like RDBMS Example:
SELECT * FROM FinancialCodes, FinancialData The results of this query matches every row in the FinancialCodes table with every row in the FinancialData table. Each row consists of all columns from the FinancialCodes table followed by all columns from the FinancialData table. Not very useful Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 25 May 2016 at 08:05, Priya Ch <learnings.chitt...@gmail.com> wrote: > Hi All, > > I have two RDDs A and B where in A is of size 30 MB and B is of size 7 > MB, A.cartesian(B) is taking too much time. Is there any bottleneck in > cartesian operation ? > > I am using spark 1.6.0 version > > Regards, > Padma Ch >