Hi Vinay,
just out of curiosity, why are you converting your Dataframes into RDDs
before the join? Join works quite well with Dataframes.
As for your problem, it looks like you gave to your executors more
memory than you physically have. As an example of executors
configuration:
> Cluster of 6 n
you could have a very large key? perhaps a token value?
i love the rdd api but have found that for joins dataframe/dataset performs
better. maybe can you do the joins in that?
On Thu, Aug 11, 2016 at 7:41 PM, Muttineni, Vinay
wrote:
> Hello,
>
> I have a spark job that basically reads data from