I think you could check the yarn nodemanager log or other Spark executor
logs to see the details. What you listed above of the exception stacks are
just the phenomenon, not the cause. Normally there will be some situations
which will lead to executor lost:
1. Killed by yarn cause of memory exceed,
Hi again,
Below is the log from executor
FetchFailed(BlockManagerId(4, compute-10-0.local, 38594), shuffleId=0,
mapId=117, reduceId=117, message=
org.apache.spark.shuffle.FetchFailedException: Failed to connect to
compute-10-0.local/10.10.255.241:38594
at
org.apache.spark.shuffle.hash.Blo
Which version of spark? Looks like you are hitting this one
https://issues.apache.org/jira/browse/SPARK-4516
Thanks
Best Regards
On Wed, Jun 3, 2015 at 1:06 PM, patcharee wrote:
> This is log I can get>
>
> 15/06/02 16:37:31 INFO shuffle.RetryingBlockFetcher: Retrying fetch (2/3)
> for 4 outst
This is log I can get>
15/06/02 16:37:31 INFO shuffle.RetryingBlockFetcher: Retrying fetch
(2/3) for 4 outstanding blocks after 5000 ms
15/06/02 16:37:36 INFO client.TransportClientFactory: Found inactive
connection to compute-10-3.local/10.10.255.238:33671, creating a new one.
15/06/02 16:37:3
You need to look into your executor/worker logs to see whats going on.
Thanks
Best Regards
On Wed, Jun 3, 2015 at 12:01 PM, patcharee
wrote:
> Hi,
>
> What can be the cause of this ERROR cluster.YarnScheduler: Lost executor?
> How can I fix it?
>
> Best,
> Patcharee
>
>
node down or container preempted ? You need to check the executor log /
node manager log for more info.
On Wed, Jun 3, 2015 at 2:31 PM, patcharee wrote:
> Hi,
>
> What can be the cause of this ERROR cluster.YarnScheduler: Lost executor?
> How can I fix it?
>
> Best,
> Patcharee
>
> -