Hi, I'm running my program on a single large memory many core machine (64 cores, 1TB RAM). But to avoid having huge JVMs, I want to use several processes / worker instances - each using 8 cores (i.e. use SPARK_WORKER_INSTANCES). When I use 2 worker instances, everything works fine, but when I try using 4 or more worker instances and start the spark-shell, I get the following exceptions by the workers:
14/03/24 08:18:51 ERROR ActorSystemImpl: Uncaught fatal error from thread [spark-akka.actor.default-dispatcher-3] shutting down ActorSystem [spark] java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:691) at scala.concurrent.forkjoin.ForkJoinPool.tryAddWorker(ForkJoinPool.java:1672) at scala.concurrent.forkjoin.ForkJoinPool.signalWork(ForkJoinPool.java:1966) at scala.concurrent.forkjoin.ForkJoinPool.externalPush(ForkJoinPool.java:1829) at scala.concurrent.forkjoin.ForkJoinPool.execute(ForkJoinPool.java:2955) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinPool.execute(AbstractDispatcher.scala:374) at akka.dispatch.ExecutorServiceDelegate$class.execute(ThreadPoolBuilder.scala:212) at akka.dispatch.Dispatcher$LazyExecutorServiceDelegate.execute(Dispatcher.scala:43) at akka.dispatch.Dispatcher.registerForExecution(Dispatcher.scala:118) at akka.dispatch.Dispatcher.dispatch(Dispatcher.scala:59) at akka.actor.dungeon.Dispatch$class.sendMessage(Dispatch.scala:120) at akka.actor.ActorCell.sendMessage(ActorCell.scala:338) at akka.actor.Cell$class.sendMessage(ActorCell.scala:259) at akka.actor.ActorCell.sendMessage(ActorCell.scala:338) at akka.actor.LocalActorRef.$bang(ActorRef.scala:389) at akka.actor.Scheduler$$anon$8.run(Scheduler.scala:62) at akka.actor.LightArrayRevolverScheduler$$anon$3$$anon$2.run(Scheduler.scala:241) at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:42) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) *The config file spark-env.sh contains:* export JAVA_HOME=/usr/java/jdk1.7.0_09 export PATH=/usr/java/jdk1.7.0_09/bin/:$PATH export SPARK_JAVA_OPTS="-Dspark.executor.memory=80g -Dspark.local.dir=/lfs/local/0/yonathan/tmp - Dspark.serializer=org.apache.spark.serializer.KryoSerializer -Dspark.kryo.registrator=org.apache.spark.graphx.GraphKryoRegistrator -Xms80g -Xmx80g -XX:-UseGCOverheadLimit -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps" export SPARK_WORKER_CORES=8 export SPARK_WORKER_MEMORY=80g export SPARK_EXECUTOR_MEMORY=80g export SPARK_DRIVER_MEMORY=10g export SPARK_DAEMON_MEMORY=10g export SPARK_WORKER_INSTANCES=4 export SPARK_DAEMON_JAVA_OPTS="-Xms10g -Xmx10g" I use *Spark-0.9.0* I would appreciate any help or advice on the subject. Thanks! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Problem-starting-worker-processes-in-standalone-mode-tp3102.html Sent from the Apache Spark User List mailing list archive at Nabble.com.