Hi All,

I have 2 large tables (~1TB), I used the following to save both the tables.
Then when I try to join both tables with join_column, it still does shuffle
& sort before the join. Could someone please help?

df.repartition(2000).write.bucketBy(1,
join_column).sortBy(join_column).saveAsTable(tablename)

-- 
Regards,

Rishi Shah

Reply via email to