I also received the same error message quite few times while saving rdd to hdfs. I am using Spark 1.1.0 with hadoop 2.5 in yarn mode.
If you see logs, you might find logs like followings. 14/10/10 14:20:21 WARN storage.BlockManagerMasterActor: Removing BlockManager BlockManagerId(6, sparkmaster.company.com, 44906, 0) with no recent heart beats: 71967ms exceeds 45000ms 14/10/10 14:31:15 INFO scheduler.TaskSetManager: Finished task 46.0 in stage 4.0 (TID 544) in 734650 ms on sparknode1.company.com (46/50) 14/10/10 14:55:31 INFO network.ConnectionManager: Removing ReceivingConnection to ConnectionManagerId(sparkmaster.company.com,44906) 14/10/10 14:55:31 INFO network.ConnectionManager: Removing SendingConnection to ConnectionManagerId(sparkmaster.company.com,44906) 14/10/10 14:55:31 INFO network.ConnectionManager: Removing SendingConnection to ConnectionManagerId(sparkmaster.company.com,44906) 14/10/10 14:55:31 INFO cluster.YarnClusterSchedulerBackend: Executor 6 disconnected, so removing it 14/10/10 14:55:31 ERROR cluster.YarnClusterScheduler: Lost executor 6 on sparkmaster.company.com: remote Akka client disassociated 14/10/10 14:55:31 INFO scheduler.TaskSetManager: Re-queueing tasks for 6 from TaskSet 4.0 14/10/10 14:55:31 WARN scheduler.TaskSetManager: Lost task 45.0 in stage 4.0 (TID 543, sparkmaster.company.com): ExecutorLostFailure (executor lost) 14/10/10 14:55:31 INFO scheduler.DAGScheduler: Executor lost: 6 (epoch 10) 14/10/10 14:55:31 INFO storage.BlockManagerMasterActor: Trying to remove executor 6 from BlockManagerMaster. 14/10/10 14:55:31 INFO storage.BlockManagerMaster: Removed 6 successfully in removeExecutor 14/10/10 14:55:31 INFO scheduler.TaskSetManager: Starting task 45.1 in stage 4.0 (TID 548, sparknode1.company.com, PROCESS_LOCAL, 948 bytes) If you are using yarn ,it will reschedule it again and start further processing. You can try updating following attributes from spark-defaults.conf spark.core.connection.ack.wait.timeout 3600 spark.core.connection.auth.wait.timeout 3600 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/ExecutorLostFailure-executor-lost-tp14117p16126.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org