I also received the same error message quite few times while saving rdd to
hdfs. 
I am using Spark 1.1.0 with hadoop 2.5 in yarn mode.

If you see logs, you might find logs like followings.

14/10/10 14:20:21 WARN storage.BlockManagerMasterActor: Removing
BlockManager BlockManagerId(6, sparkmaster.company.com, 44906, 0) with no
recent heart beats: 71967ms exceeds 45000ms
14/10/10 14:31:15 INFO scheduler.TaskSetManager: Finished task 46.0 in stage
4.0 (TID 544) in 734650 ms on sparknode1.company.com (46/50)
14/10/10 14:55:31 INFO network.ConnectionManager: Removing
ReceivingConnection to ConnectionManagerId(sparkmaster.company.com,44906)
14/10/10 14:55:31 INFO network.ConnectionManager: Removing SendingConnection
to ConnectionManagerId(sparkmaster.company.com,44906)
14/10/10 14:55:31 INFO network.ConnectionManager: Removing SendingConnection
to ConnectionManagerId(sparkmaster.company.com,44906)
14/10/10 14:55:31 INFO cluster.YarnClusterSchedulerBackend: Executor 6
disconnected, so removing it
14/10/10 14:55:31 ERROR cluster.YarnClusterScheduler: Lost executor 6 on
sparkmaster.company.com: remote Akka client disassociated
14/10/10 14:55:31 INFO scheduler.TaskSetManager: Re-queueing tasks for 6
from TaskSet 4.0
14/10/10 14:55:31 WARN scheduler.TaskSetManager: Lost task 45.0 in stage 4.0
(TID 543, sparkmaster.company.com): ExecutorLostFailure (executor lost)
14/10/10 14:55:31 INFO scheduler.DAGScheduler: Executor lost: 6 (epoch 10)
14/10/10 14:55:31 INFO storage.BlockManagerMasterActor: Trying to remove
executor 6 from BlockManagerMaster.
14/10/10 14:55:31 INFO storage.BlockManagerMaster: Removed 6 successfully in
removeExecutor
14/10/10 14:55:31 INFO scheduler.TaskSetManager: Starting task 45.1 in stage
4.0 (TID 548, sparknode1.company.com, PROCESS_LOCAL, 948 bytes) 

If you are using yarn ,it will reschedule it again and start further
processing.

You can try updating following attributes from spark-defaults.conf
spark.core.connection.ack.wait.timeout  3600
spark.core.connection.auth.wait.timeout 3600



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/ExecutorLostFailure-executor-lost-tp14117p16126.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to