Hi,

I have written spark sql job on spark2.0 by using scala . It is just pulling 
the data from hive table and add extra columns , remove duplicates and then 
write it back to hive again.

In spark ui, it is taking almost 40 minutes to write 400 go of data. Is there 
anything that I need to improve performance .

Spark.sql.partitions is 2000 in my case with executor memory of 16gb and 
dynamic allocation enabled.

I am doing insert overwrite on partition by
Da.write.mode(overwrite).insertinto(table)

Any suggestions please ??

Sent from my iPhone
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to