Hi All,
Just checking in to see if anyone has any advice on this.
Thanks,
Rishi
On Mon, Mar 2, 2020 at 9:21 PM Rishi Shah wrote:
> Hi All,
>
> I have 2 large tables (~1TB), I used the following to save both the
> tables. Then when I try to join both tables with join_column, it still does
> shu
Hi All,
I have 2 large tables (~1TB), I used the following to save both the tables.
Then when I try to join both tables with join_column, it still does shuffle
& sort before the join. Could someone please help?
df.repartition(2000).write.bucketBy(1,
join_column).sortBy(join_column).saveAsTable(ta