Hi Greg,

Have you looked at the AM container logs? (You may already know this, but)
you can get these through the RM web UI or through:

yarn logs -applicationId <your app ID>

If an AM throws an exception then the executors may not be started properly.

-Andrew



2014-10-02 9:47 GMT-07:00 Greg Hill <greg.h...@rackspace.com>:

>  I haven't run into this until today.  I spun up a fresh cluster to do
> some more testing, and it seems that every single executor fails because it
> can't connect to the driver.  This is in the YARN logs:
>
>  14/10/02 16:24:11 INFO executor.CoarseGrainedExecutorBackend: Connecting
> to driver: akka.tcp://sparkDriver@GATEWAY-1
> :60855/user/CoarseGrainedScheduler
> 14/10/02 16:24:11 ERROR executor.CoarseGrainedExecutorBackend: Driver
> Disassociated [akka.tcp://sparkExecutor@DATANODE-3:58232] ->
> [akka.tcp://sparkDriver@GATEWAY-1:60855] disassociated! Shutting down.
>
>  And this is what shows up from the driver:
>
>  14/10/02 16:43:06 INFO cluster.YarnClientSchedulerBackend: Registered
> executor: 
> Actor[akka.tcp://sparkExecutor@DATANODE-1:60341/user/Executor#1289950113]
> with ID 2
> 14/10/02 16:43:06 INFO util.RackResolver: Resolved DATANODE-1 to
> /rack/node8da83a04def73517bf437e95aeefa2469b1daf14
> 14/10/02 16:43:06 INFO cluster.YarnClientSchedulerBackend: Executor 2
> disconnected, so removing it
>
> It doesn't appear to be a networking issue.  Networking works both
> directions and there's no firewall blocking ports.  Googling the issue, it
> sounds like the most common problem is overallocation of memory, but I'm
> not doing that.  I've got these settings for a 3 * 128GB node cluster:
>
>  spark.executor.instances            17
>  spark.executor.memory               12424m
> spark.yarn.executor.memoryOverhead  3549
>
>  That makes it 6 * 15973 = 95838 MB per node, which is well beneath the
> 128GB limit.
>
>  Frankly I'm stumped.  It worked fine when I spun up a cluster last week,
> but now it doesn't.  The logs give me no indication as to what the problem
> actually is.  Any pointers to where else I might look?
>
>  Thanks in advance.
>
>  Greg
>

Reply via email to