Hi,

I am running Spark 1.2.1 for compute intensive jobs comprising of multiple
tasks. I have observed that most tasks complete very quickly, but there are
always one or two tasks that take a lot of time to complete thereby
increasing the overall stage time. What could be the reason for this?

Following are the statistics for one such stage. As you can see, the task
with index 0 takes 1.1 minutes whereas others completed much more quickly.

Aggregated Metrics by Executor
Executor ID     Address         Task Time       Total Tasks     Failed Tasks    
Succeeded Tasks 
Input   Output  Shuffle Read    Shuffle Write   Shuffle Spill (Memory)  Shuffle
Spill (Disk)
0       slave1:56311    46 s    13      0       13      0.0 B   0.0 B   0.0 B   
0.0 B   0.0 B   0.0 B
1       slave2:42648    2.1 min         13      0       13      0.0 B   0.0 B   
384.3 KB        0.0 B   0.0 B 
0.0 B
2       slave3:44322    23 s    12      0       12      0.0 B   0.0 B   136.4 
KB        0.0 B   0.0 B   0.0
B
3       slave4:37987    44 s    12      0       12      0.0 B   0.0 B   213.9 
KB        0.0 B   0.0 B   0.0
B
Tasks
Index   ID      Attempt Status  Locality Level  Executor ID / Host      Launch 
Time
Duration        GC Time Shuffle Read    Errors
0       213     0       SUCCESS         PROCESS_LOCAL   1 / slave2      
2015/02/19 11:40:05     1.1 min 
1 s     153.3 KB        
5       218     0       SUCCESS         PROCESS_LOCAL   3 / slave4      
2015/02/19 11:40:05     23 ms   
26.0 B  
1       214     0       SUCCESS         PROCESS_LOCAL   3 / slave4      
2015/02/19 11:40:05     2 s     0.9
s       13.8 KB         
4       217     0       SUCCESS         PROCESS_LOCAL   1 / slave2      
2015/02/19 11:40:05     26 ms   
26.0 B  
3       216     0       SUCCESS         PROCESS_LOCAL   0 / slave1      
2015/02/19 11:40:05     11 ms   
0.0 B   
2       215     0       SUCCESS         PROCESS_LOCAL   2 / slave3      
2015/02/19 11:40:05     27 ms   
26.0 B  
7       220     0       SUCCESS         PROCESS_LOCAL   0 / slave1      
2015/02/19 11:40:05     11 ms   
0.0 B   
10      223     0       SUCCESS         PROCESS_LOCAL   2 / slave3      
2015/02/19 11:40:05     23 ms   
26.0 B  
6       219     0       SUCCESS         PROCESS_LOCAL   2 / slave3      
2015/02/19 11:40:05     23 ms   
26.0 B  
9       222     0       SUCCESS         PROCESS_LOCAL   3 / slave4      
2015/02/19 11:40:05     23 ms   
26.0 B  
8       221     0       SUCCESS         PROCESS_LOCAL   1 / slave2      
2015/02/19 11:40:05     23 ms   
26.0 B  
11      224     0       SUCCESS         PROCESS_LOCAL   0 / slave1      
2015/02/19 11:40:05     10 ms   
0.0 B   
14      227     0       SUCCESS         PROCESS_LOCAL   2 / slave3      
2015/02/19 11:40:05     24 ms   
26.0 B  
13      226     0       SUCCESS         PROCESS_LOCAL   3 / slave4      
2015/02/19 11:40:05     23 ms   
26.0 B  
16      229     0       SUCCESS         PROCESS_LOCAL   1 / slave2      
2015/02/19 11:40:05     22 ms   
26.0 B  
12      225     0       SUCCESS         PROCESS_LOCAL   1 / slave2      
2015/02/19 11:40:05     22 ms   
26.0 B  
15      228     0       SUCCESS         PROCESS_LOCAL   0 / slave1      
2015/02/19 11:40:05     10 ms   
0.0 B   
17      230     0       SUCCESS         PROCESS_LOCAL   3 / slave4      
2015/02/19 11:40:05     22 ms   
26.0 B  
23      236     0       SUCCESS         PROCESS_LOCAL   0 / slave1      
2015/02/19 11:40:05     10 ms   
0.0 B   
22      235     0       SUCCESS         PROCESS_LOCAL   2 / slave3      
2015/02/19 11:40:05     21 ms   
26.0 B  
19      232     0       SUCCESS         PROCESS_LOCAL   0 / slave1      
2015/02/19 11:40:05     10 ms   
0.0 B   
21      234     0       SUCCESS         PROCESS_LOCAL   3 / slave4      
2015/02/19 11:40:05     25 ms   
26.0 B  
18      231     0       SUCCESS         PROCESS_LOCAL   2 / slave3      
2015/02/19 11:40:05     24 ms   
26.0 B  
20      233     0       SUCCESS         PROCESS_LOCAL   1 / slave2      
2015/02/19 11:40:05     28 ms   
26.0 B  
25      238     0       SUCCESS         PROCESS_LOCAL   3 / slave4      
2015/02/19 11:40:05     20 ms   
26.0 B  
28      241     0       SUCCESS         PROCESS_LOCAL   1 / slave2      
2015/02/19 11:40:05     27 ms   
26.0 B  
27      240     0       SUCCESS         PROCESS_LOCAL   0 / slave1      
2015/02/19 11:40:05     10 ms   
0.0 B 


Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Some-tasks-taking-too-much-time-to-complete-in-a-stage-tp21724.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to