Hi All, I have 2 large tables (~1TB), I used the following to save both the tables. Then when I try to join both tables with join_column, it still does shuffle & sort before the join. Could someone please help?
df.repartition(2000).write.bucketBy(1, join_column).sortBy(join_column).saveAsTable(tablename) -- Regards, Rishi Shah