Hi Vijay, Thanks for the follow-up.
The reason why we have 90 HDFS files (causing the parallelism of 90 for HDFS read stage) is because we load the same HDFS data in different jobs, and these jobs have parallelisms (executors X cores) of 9, 18, 30. The uneven assignment problem that we had before could not be explained by modulo operation/remainder, because we sometimes had only 2 executors active out of 9 (while the remaining 7 would stay completely idle). We tried to repartition the Kafka stream to 90 partitions, but it led to even worse disbalance in the load. Seems that keeping the number of partitions equal to executors X cores reduces the chance of uneven assignment. We also tried to repartition the HDFS data to 9 partitions, but it did not help, because repartition takes into account the initial locality of data, so 9 partitions may end up on 9 different cores. We also tried to set spark.shuffle.reduceLocality.enabled=false, but it did not help. Last but not least, we want to avoid coleasce, because then partitions would depend on the HDFS block distribution, so they would not be hash partitioned (which we need for the join). Please find below the relevant UI snapshots: <http://apache-spark-user-list.1001560.n3.nabble.com/file/t8940/jobOverview.png> <http://apache-spark-user-list.1001560.n3.nabble.com/file/t8940/jobDetails.png> The snapshots refers to the batch when RDD is reloaded (WholeStageCodegen 1147 is gray except in the batch at reload time, which happens every 30 minutes). Thanks a lot! -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org