Cartesian joins tend to give a huge result size, and are inherently slow. If RDD B has N records then your result size will be at least N * 30 MB, since you have to replicate all the rows of A for a single record in B.
Assuming RDD B has 10,000 records then you can see that your cartesian join will give an RDD that takes at least 300 GB, presumably more than the RAM on your system... On Wed, May 25, 2016 at 3:05 AM, Priya Ch <learnings.chitt...@gmail.com> wrote: > Hi All, > > I have two RDDs A and B where in A is of size 30 MB and B is of size 7 > MB, A.cartesian(B) is taking too much time. Is there any bottleneck in > cartesian operation ? > > I am using spark 1.6.0 version > > Regards, > Padma Ch >