I have 2 dataframes,

Dataframe A which contains 1 element per partition that is gigabytes big
(an index)

Dataframe B which is made up out of millions of small rows.

I want to join B on A but i want all the work to be done on the executors
holding the partitions of dataframe A

Is there a way to accomplish this without putting dataframe B in a
broadcast variable or doing a broadcast join ?

Reply via email to