Re: Cartesian join on RDDs taking too much time

Max Sperlich Wed, 25 May 2016 06:56:37 -0700

Cartesian joins tend to give a huge result size, and are inherently slow.
If RDD B has N records then your result size will be at least N * 30 MB,
since you have to replicate all the rows of A for a single record in B.

Assuming RDD B has 10,000 records then you can see that your cartesian join
will give an RDD that takes at least 300 GB, presumably more than the RAM
on your system...

On Wed, May 25, 2016 at 3:05 AM, Priya Ch <learnings.chitt...@gmail.com>
wrote:

> Hi All,
>
>   I have two RDDs A and B where in A is of size 30 MB and B is of size 7
> MB, A.cartesian(B) is taking too much time. Is there any bottleneck in
> cartesian operation ?
>
> I am using spark 1.6.0 version
>
> Regards,
> Padma Ch
>

Re: Cartesian join on RDDs taking too much time

Reply via email to