Your input is skewed in terms of the default hash partitioner that is used. Your options are to use a custom partitioner that can re-distribute the data evenly among your executors.
I think you will see the same behaviour when you use more executors. It is just that the data skew appears to be less. To prove the same, use a even bigger input for your job. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-work-distribution-among-execs-tp26502p26506.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org