I am implementing wordcount on the spark cluster (1 master, 3 slaves) in
standalone mode. I have 546G data, and the dfs.blocksize I set is 256MB.
Therefore, the amount of tasks are 2186. My 3 slaves each uses 22 cores and
72 memory to do the processing, so the computing ability of each slave
should be same. 

Since wordcount just has two parts, map and reduce, therefore, I think in
each stage, each task takes care of one partition, so the length of each
task should be nearly same right?

However, from the event timeline I saw in job UI, I found that the length of
each task in mapToPair stage varies much and there were many small tasks. I
don't know if it is normal or it is my own problem ? 

Here is the pic of event timeline,
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n24008/QQ%E6%88%AA%E5%9B%BE20150727172511.png>
 

And the amount of the tasks assigned to each slave are also different,
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n24008/QQ%E6%88%AA%E5%9B%BE20150727172739.png>
 

Anybody has any idea with this? Thanks in advance.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Why-the-length-of-each-task-varies-tp24008.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to