I am implementing wordcount on the spark cluster (1 master, 3 slaves) in standalone mode. I have 546G data, and the dfs.blocksize I set is 256MB. Therefore, the amount of tasks are 2186. My 3 slaves each uses 22 cores and 72 memory to do the processing, so the computing ability of each slave should be same.
Since wordcount just has two parts, map and reduce, therefore, I think in each stage, each task takes care of one partition, so the length of each task should be nearly same right? However, from the event timeline I saw in job UI, I found that the length of each task in mapToPair stage varies much and there were many small tasks. I don't know if it is normal or it is my own problem ? Here is the pic of event timeline, <http://apache-spark-user-list.1001560.n3.nabble.com/file/n24008/QQ%E6%88%AA%E5%9B%BE20150727172511.png> And the amount of the tasks assigned to each slave are also different, <http://apache-spark-user-list.1001560.n3.nabble.com/file/n24008/QQ%E6%88%AA%E5%9B%BE20150727172739.png> Anybody has any idea with this? Thanks in advance. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Why-the-length-of-each-task-varies-tp24008.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org