Hi Greg, Have you looked at the AM container logs? (You may already know this, but) you can get these through the RM web UI or through:
yarn logs -applicationId <your app ID> If an AM throws an exception then the executors may not be started properly. -Andrew 2014-10-02 9:47 GMT-07:00 Greg Hill <greg.h...@rackspace.com>: > I haven't run into this until today. I spun up a fresh cluster to do > some more testing, and it seems that every single executor fails because it > can't connect to the driver. This is in the YARN logs: > > 14/10/02 16:24:11 INFO executor.CoarseGrainedExecutorBackend: Connecting > to driver: akka.tcp://sparkDriver@GATEWAY-1 > :60855/user/CoarseGrainedScheduler > 14/10/02 16:24:11 ERROR executor.CoarseGrainedExecutorBackend: Driver > Disassociated [akka.tcp://sparkExecutor@DATANODE-3:58232] -> > [akka.tcp://sparkDriver@GATEWAY-1:60855] disassociated! Shutting down. > > And this is what shows up from the driver: > > 14/10/02 16:43:06 INFO cluster.YarnClientSchedulerBackend: Registered > executor: > Actor[akka.tcp://sparkExecutor@DATANODE-1:60341/user/Executor#1289950113] > with ID 2 > 14/10/02 16:43:06 INFO util.RackResolver: Resolved DATANODE-1 to > /rack/node8da83a04def73517bf437e95aeefa2469b1daf14 > 14/10/02 16:43:06 INFO cluster.YarnClientSchedulerBackend: Executor 2 > disconnected, so removing it > > It doesn't appear to be a networking issue. Networking works both > directions and there's no firewall blocking ports. Googling the issue, it > sounds like the most common problem is overallocation of memory, but I'm > not doing that. I've got these settings for a 3 * 128GB node cluster: > > spark.executor.instances 17 > spark.executor.memory 12424m > spark.yarn.executor.memoryOverhead 3549 > > That makes it 6 * 15973 = 95838 MB per node, which is well beneath the > 128GB limit. > > Frankly I'm stumped. It worked fine when I spun up a cluster last week, > but now it doesn't. The logs give me no indication as to what the problem > actually is. Any pointers to where else I might look? > > Thanks in advance. > > Greg >