Hi all, Recently we started having issues with one of our background processing scripts which we run on Spark. The cluster runs only two jobs. One job runs for days, and another is usually like a couple of hours. Both jobs have a crob schedule. The cluster is small, just 2 slaves, 24 cores, 25.4 GB of memory. Each job takes 6 cores and 6 GB per worker. So when both jobs are running it's 12 cores out of 24 cores and 24 GB out of 25.4 GB. But sometimes I see this:
https://www.dropbox.com/s/6uad4hrchqpihp4/Screen%20Shot%202016-01-25%20at%201.16.19%20PM.png So basically the long running job somehow occupied the whole cluster and the fast one can't make any progress because the cluster doesn't have resources. That's what I see in the logs: 16/01/25 21:26:48 WARN TaskSchedulerImpl: Initial job has not accepted any > resources; check your cluster UI to ensure that workers are registered and > have sufficient resources When I log in to the slaves I see this: slave 1: > /usr/lib/jvm/java/bin/java -cp <some_jars> -Xms6144M -Xmx6144M > -Dspark.driver.port=42548 -Drun.mode=production -XX:MaxPermSize=256m > org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url > akka.tcp://sparkDriver@10.233.17.48:42548/user/CoarseGrainedScheduler > --executor-id 450 --hostname 10.191.4.151 *--cores 1 --app-id > app-20160124152439-1468* --worker-url akka.tcp:// > sparkWorker@10.191.4.151:53144/user/Worker > /usr/lib/jvm/java/bin/java -cp -cp <some_jars> -Xms6144M -Xmx6144M > -Dspark.driver.port=42548 -Drun.mode=production -XX:MaxPermSize=256m > org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url > akka.tcp://sparkDriver@10.233.17.48:42548/user/CoarseGrainedScheduler > --executor-id 451 --hostname 10.191.4.151 *--cores 1 --app-id > app-20160124152439-1468* --worker-url akka.tcp:// > sparkWorker@10.191.4.151:53144/user/Worker slave 2: > /usr/lib/jvm/java/bin/java -cp <some_jars> -Xms6144M -Xmx6144M > -Dspark.driver.port=42548 -Drun.mode=production -XX:MaxPermSize=256m > org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url > akka.tcp://sparkDriver@10.233.17.48:42548/user/CoarseGrainedScheduler > --executor-id 1 --hostname 10.253.142.59 *--cores 3 --app-id > app-20160124152439-1468* --worker-url akka.tcp:// > sparkWorker@10.253.142.59:33265/user/Worker > /usr/lib/jvm/java/bin/java -cp -cp <some_jars> -Xms6144M -Xmx6144M > -Dspark.driver.port=42548 -Drun.mode=production -XX:MaxPermSize=256m > org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url > akka.tcp://sparkDriver@10.233.17.48:42548/user/CoarseGrainedScheduler > --executor-id 448 --hostname 10.253.142.59 *--cores 1 --app-id > app-20160124152439-1468* --worker-url akka.tcp:// > sparkWorker@10.253.142.59:33265/user/Worker so somehow Spark created 4 executors, 2 on each machine, 1 core + 1 core and 3 cores + 1 core to get the total of 6 cores. But because 6 GB setting is per executor, it ends up occupying 24 GB instead of 12 GB (2 executors, 3 cores + 3 cores) and blocks the other Spark job. My wild guess is that for some reason 1 executor of the long job failed, so the job becomes 3 cores short and asks the scheduler if it can get 3 more cores, then the scheduler distributes it evenly across the slaves: 2 cores + 1 core but this distribution doesn't work until the short job finishes (because the shor job holds the rest of the memory). This explains 3 + 1 on one slave but doesn't explain 1 + 1 on another. Did anyone experience anything similar to this? Any ideas how to avoid it? Thanks, Mikhail