I think your config may be the issue then. It sounds like 1 server is configured in a different YARN group that states it has way less resource than it does.
On Wed, Nov 19, 2014 at 5:27 PM, Alan Prando <a...@scanboo.com.br> wrote: > Hi all! > > Thanks for answering! > > @Sean, I tried to run with 30 executor-cores , and 1 machine still without > processing. > @Vanzin, I checked RM's web UI, and all nodes were detecteds and "RUNNING". > The interesting fact is that available > memory and available core of 1 node was different of other 2, with just 1 > available core and 1 available gig ram. > > @All, I created a new cluster with 10 slaves and 1 master, and now 9 of my > slaves are working, and 1 still without processing. > > It's fine by me! I'm just wondering why YARN's doing it... Does anyone know > the answer? > > 2014-11-18 16:18 GMT-02:00 Sean Owen <so...@cloudera.com>: > >> My guess is you're asking for all cores of all machines but the driver >> needs at least one core, so one executor is unable to find a machine to fit >> on. >> >> On Nov 18, 2014 7:04 PM, "Alan Prando" <a...@scanboo.com.br> wrote: >>> >>> Hi Folks! >>> >>> I'm running Spark on YARN cluster installed with Cloudera Manager >>> Express. >>> The cluster has 1 master and 3 slaves, each machine with 32 cores and 64G >>> RAM. >>> >>> My spark's job is working fine, however it seems that just 2 of 3 slaves >>> are working (htop shows 2 slaves working 100% on 32 cores, and 1 slaves >>> without any processing). >>> >>> I'm using this command: >>> ./spark-submit --master yarn --num-executors 3 --executor-cores 32 >>> --executor-memory 32g feature_extractor.py -r 390 >>> >>> Additionaly, spark's log testify communications with 2 slaves only: >>> 14/11/18 17:19:38 INFO YarnClientSchedulerBackend: Registered executor: >>> Actor[akka.tcp://sparkExecutor@ip-172-31-13-180.ec2.internal:33177/user/Executor#-113177469] >>> with ID 1 >>> 14/11/18 17:19:38 INFO RackResolver: Resolved >>> ip-172-31-13-180.ec2.internal to /default >>> 14/11/18 17:19:38 INFO YarnClientSchedulerBackend: Registered executor: >>> Actor[akka.tcp://sparkExecutor@ip-172-31-13-179.ec2.internal:51859/user/Executor#-323896724] >>> with ID 2 >>> 14/11/18 17:19:38 INFO RackResolver: Resolved >>> ip-172-31-13-179.ec2.internal to /default >>> 14/11/18 17:19:38 INFO BlockManagerMasterActor: Registering block manager >>> ip-172-31-13-180.ec2.internal:50959 with 16.6 GB RAM >>> 14/11/18 17:19:39 INFO BlockManagerMasterActor: Registering block manager >>> ip-172-31-13-179.ec2.internal:53557 with 16.6 GB RAM >>> 14/11/18 17:19:51 INFO YarnClientSchedulerBackend: SchedulerBackend is >>> ready for scheduling beginning after waiting >>> maxRegisteredResourcesWaitingTime: 30000(ms) >>> >>> Is there a configuration to call spark's job on YARN cluster with all >>> slaves? >>> >>> Thanks in advance! =] >>> >>> --- >>> Regards >>> Alan Vidotti Prando. >>> >>> > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org