Hi, I have written spark sql job on spark2.0 by using scala . It is just pulling the data from hive table and add extra columns , remove duplicates and then write it back to hive again.
In spark ui, it is taking almost 40 minutes to write 400 go of data. Is there anything that I need to improve performance . Spark.sql.partitions is 2000 in my case with executor memory of 16gb and dynamic allocation enabled. I am doing insert overwrite on partition by Da.write.mode(overwrite).insertinto(table) Any suggestions please ?? Sent from my iPhone --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org