Re: ERROR cluster.YarnScheduler: Lost executor

2015-06-03 Thread Saisai Shao
I think you could check the yarn nodemanager log or other Spark executor logs to see the details. What you listed above of the exception stacks are just the phenomenon, not the cause. Normally there will be some situations which will lead to executor lost: 1. Killed by yarn cause of memory exceed,

Re: ERROR cluster.YarnScheduler: Lost executor

2015-06-03 Thread patcharee
Hi again, Below is the log from executor FetchFailed(BlockManagerId(4, compute-10-0.local, 38594), shuffleId=0, mapId=117, reduceId=117, message= org.apache.spark.shuffle.FetchFailedException: Failed to connect to compute-10-0.local/10.10.255.241:38594 at org.apache.spark.shuffle.hash.Blo

Re: ERROR cluster.YarnScheduler: Lost executor

2015-06-03 Thread Akhil Das
Which version of spark? Looks like you are hitting this one https://issues.apache.org/jira/browse/SPARK-4516 Thanks Best Regards On Wed, Jun 3, 2015 at 1:06 PM, patcharee wrote: > This is log I can get> > > 15/06/02 16:37:31 INFO shuffle.RetryingBlockFetcher: Retrying fetch (2/3) > for 4 outst

Re: ERROR cluster.YarnScheduler: Lost executor

2015-06-03 Thread patcharee
This is log I can get> 15/06/02 16:37:31 INFO shuffle.RetryingBlockFetcher: Retrying fetch (2/3) for 4 outstanding blocks after 5000 ms 15/06/02 16:37:36 INFO client.TransportClientFactory: Found inactive connection to compute-10-3.local/10.10.255.238:33671, creating a new one. 15/06/02 16:37:3

Re: ERROR cluster.YarnScheduler: Lost executor

2015-06-03 Thread Akhil Das
You need to look into your executor/worker logs to see whats going on. Thanks Best Regards On Wed, Jun 3, 2015 at 12:01 PM, patcharee wrote: > Hi, > > What can be the cause of this ERROR cluster.YarnScheduler: Lost executor? > How can I fix it? > > Best, > Patcharee > >

Re: ERROR cluster.YarnScheduler: Lost executor

2015-06-03 Thread Jeff Zhang
node down or container preempted ? You need to check the executor log / node manager log for more info. On Wed, Jun 3, 2015 at 2:31 PM, patcharee wrote: > Hi, > > What can be the cause of this ERROR cluster.YarnScheduler: Lost executor? > How can I fix it? > > Best, > Patcharee > > -