Hi, I am running Spark 1.2.1 for compute intensive jobs comprising of multiple tasks. I have observed that most tasks complete very quickly, but there are always one or two tasks that take a lot of time to complete thereby increasing the overall stage time. What could be the reason for this?
Following are the statistics for one such stage. As you can see, the task with index 0 takes 1.1 minutes whereas others completed much more quickly. Aggregated Metrics by Executor Executor ID Address Task Time Total Tasks Failed Tasks Succeeded Tasks Input Output Shuffle Read Shuffle Write Shuffle Spill (Memory) Shuffle Spill (Disk) 0 slave1:56311 46 s 13 0 13 0.0 B 0.0 B 0.0 B 0.0 B 0.0 B 0.0 B 1 slave2:42648 2.1 min 13 0 13 0.0 B 0.0 B 384.3 KB 0.0 B 0.0 B 0.0 B 2 slave3:44322 23 s 12 0 12 0.0 B 0.0 B 136.4 KB 0.0 B 0.0 B 0.0 B 3 slave4:37987 44 s 12 0 12 0.0 B 0.0 B 213.9 KB 0.0 B 0.0 B 0.0 B Tasks Index ID Attempt Status Locality Level Executor ID / Host Launch Time Duration GC Time Shuffle Read Errors 0 213 0 SUCCESS PROCESS_LOCAL 1 / slave2 2015/02/19 11:40:05 1.1 min 1 s 153.3 KB 5 218 0 SUCCESS PROCESS_LOCAL 3 / slave4 2015/02/19 11:40:05 23 ms 26.0 B 1 214 0 SUCCESS PROCESS_LOCAL 3 / slave4 2015/02/19 11:40:05 2 s 0.9 s 13.8 KB 4 217 0 SUCCESS PROCESS_LOCAL 1 / slave2 2015/02/19 11:40:05 26 ms 26.0 B 3 216 0 SUCCESS PROCESS_LOCAL 0 / slave1 2015/02/19 11:40:05 11 ms 0.0 B 2 215 0 SUCCESS PROCESS_LOCAL 2 / slave3 2015/02/19 11:40:05 27 ms 26.0 B 7 220 0 SUCCESS PROCESS_LOCAL 0 / slave1 2015/02/19 11:40:05 11 ms 0.0 B 10 223 0 SUCCESS PROCESS_LOCAL 2 / slave3 2015/02/19 11:40:05 23 ms 26.0 B 6 219 0 SUCCESS PROCESS_LOCAL 2 / slave3 2015/02/19 11:40:05 23 ms 26.0 B 9 222 0 SUCCESS PROCESS_LOCAL 3 / slave4 2015/02/19 11:40:05 23 ms 26.0 B 8 221 0 SUCCESS PROCESS_LOCAL 1 / slave2 2015/02/19 11:40:05 23 ms 26.0 B 11 224 0 SUCCESS PROCESS_LOCAL 0 / slave1 2015/02/19 11:40:05 10 ms 0.0 B 14 227 0 SUCCESS PROCESS_LOCAL 2 / slave3 2015/02/19 11:40:05 24 ms 26.0 B 13 226 0 SUCCESS PROCESS_LOCAL 3 / slave4 2015/02/19 11:40:05 23 ms 26.0 B 16 229 0 SUCCESS PROCESS_LOCAL 1 / slave2 2015/02/19 11:40:05 22 ms 26.0 B 12 225 0 SUCCESS PROCESS_LOCAL 1 / slave2 2015/02/19 11:40:05 22 ms 26.0 B 15 228 0 SUCCESS PROCESS_LOCAL 0 / slave1 2015/02/19 11:40:05 10 ms 0.0 B 17 230 0 SUCCESS PROCESS_LOCAL 3 / slave4 2015/02/19 11:40:05 22 ms 26.0 B 23 236 0 SUCCESS PROCESS_LOCAL 0 / slave1 2015/02/19 11:40:05 10 ms 0.0 B 22 235 0 SUCCESS PROCESS_LOCAL 2 / slave3 2015/02/19 11:40:05 21 ms 26.0 B 19 232 0 SUCCESS PROCESS_LOCAL 0 / slave1 2015/02/19 11:40:05 10 ms 0.0 B 21 234 0 SUCCESS PROCESS_LOCAL 3 / slave4 2015/02/19 11:40:05 25 ms 26.0 B 18 231 0 SUCCESS PROCESS_LOCAL 2 / slave3 2015/02/19 11:40:05 24 ms 26.0 B 20 233 0 SUCCESS PROCESS_LOCAL 1 / slave2 2015/02/19 11:40:05 28 ms 26.0 B 25 238 0 SUCCESS PROCESS_LOCAL 3 / slave4 2015/02/19 11:40:05 20 ms 26.0 B 28 241 0 SUCCESS PROCESS_LOCAL 1 / slave2 2015/02/19 11:40:05 27 ms 26.0 B 27 240 0 SUCCESS PROCESS_LOCAL 0 / slave1 2015/02/19 11:40:05 10 ms 0.0 B Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Some-tasks-taking-too-much-time-to-complete-in-a-stage-tp21724.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org