Re: Map side join without broadcast

Arbab Khalil Sat, 29 Jun 2019 15:37:08 -0700

You can use coalesce(1) or repartition on B but it would be better to put A
in cache so that it becomes available on all executors and as well as in
memory because it contians on one row.


On Sat, Jun 29, 2019 at 4:10 PM jelmer <jkupe...@gmail.com> wrote:

> I have 2 dataframes,
>
> Dataframe A which contains 1 element per partition that is gigabytes big
> (an index)
>
> Dataframe B which is made up out of millions of small rows.
>
> I want to join B on A but i want all the work to be done on the executors
> holding the partitions of dataframe A
>
> Is there a way to accomplish this without putting dataframe B in a
> broadcast variable or doing a broadcast join ?
>
>

-- 
Regards,
Arbab Khalil
Software Design Engineer

Re: Map side join without broadcast

Reply via email to