Hi Guys, Here's some lines from the log file before the OOM. They don't look that helpful, so let me know if there's anything else I should be sending. I am running in standalone mode.
spark-pulse-org.apache.spark.deploy.master.Master-1-hadoop10.pulse.io.out.5:java.lang.OutOfMemoryError: Java heap space spark-pulse-org.apache.spark.deploy.master.Master-1-hadoop10.pulse.io.out.5-14/10/22 05:00:36 ERROR ActorSystemImpl: Uncaught fatal error from thread [sparkMaster-akka.actor.default-dispatcher-52] shutting down ActorSystem [sparkMaster] spark-pulse-org.apache.spark.deploy.master.Master-1-hadoop10.pulse.io.out.5:java.lang.OutOfMemoryError: Java heap space spark-pulse-org.apache.spark.deploy.master.Master-1-hadoop10.pulse.io.out.5:Exception in thread "qtp2057079871-30" java.lang.OutOfMemoryError: Java heap space spark-pulse-org.apache.spark.deploy.master.Master-1-hadoop10.pulse.io.out.5-14/10/22 05:00:07 WARN AbstractNioSelector: Unexpected exception in the selector loop. spark-pulse-org.apache.spark.deploy.master.Master-1-hadoop10.pulse.io.out.5-14/10/22 05:02:51 ERROR ActorSystemImpl: Uncaught fatal error from thread [sparkMaster-8] shutting down ActorSystem [sparkMaster] spark-pulse-org.apache.spark.deploy.master.Master-1-hadoop10.pulse.io.out.5:java.lang.OutOfMemoryError: Java heap space spark-pulse-org.apache.spark.deploy.master.Master-1-hadoop10.pulse.io.out.5-14/10/22 05:03:22 ERROR ActorSystemImpl: Uncaught fatal error from thread [sparkMaster-akka.actor.default-dispatcher-38] shutting down ActorSystem [sparkMaster] spark-pulse-org.apache.spark.deploy.master.Master-1-hadoop10.pulse.io.out.5:java.lang.OutOfMemoryError: Java heap space spark-pulse-org.apache.spark.deploy.master.Master-1-hadoop10.pulse.io.out.5-14/10/22 05:03:22 ERROR ActorSystemImpl: Uncaught fatal error from thread [sparkMaster-6] shutting down ActorSystem [sparkMaster] spark-pulse-org.apache.spark.deploy.master.Master-1-hadoop10.pulse.io.out.5:java.lang.OutOfMemoryError: Java heap space spark-pulse-org.apache.spark.deploy.master.Master-1-hadoop10.pulse.io.out.5-14/10/22 05:03:22 ERROR ActorSystemImpl: Uncaught fatal error from thread [sparkMaster-akka.actor.default-dispatcher-43] shutting down ActorSystem [sparkMaster] spark-pulse-org.apache.spark.deploy.master.Master-1-hadoop10.pulse.io.out.5:java.lang.OutOfMemoryError: Java heap space spark-pulse-org.apache.spark.deploy.master.Master-1-hadoop10.pulse.io.out.5-14/10/22 05:03:22 ERROR ActorSystemImpl: Uncaught fatal error from thread [sparkMaster-akka.actor.default-dispatcher-13] shutting down ActorSystem [sparkMaster] spark-pulse-org.apache.spark.deploy.master.Master-1-hadoop10.pulse.io.out.5:java.lang.OutOfMemoryError: Java heap space spark-pulse-org.apache.spark.deploy.master.Master-1-hadoop10.pulse.io.out.5-14/10/22 05:03:22 ERROR ActorSystemImpl: Uncaught fatal error from thread [sparkMaster-5] shutting down ActorSystem [sparkMaster] spark-pulse-org.apache.spark.deploy.master.Master-1-hadoop10.pulse.io.out.5:java.lang.OutOfMemoryError: Java heap space spark-pulse-org.apache.spark.deploy.master.Master-1-hadoop10.pulse.io.out.5-14/10/22 05:03:22 ERROR ActorSystemImpl: Uncaught fatal error from thread [sparkMaster-akka.actor.default-dispatcher-12] shutting down ActorSystem [sparkMaster] On Thu, Oct 23, 2014 at 2:10 PM, Nan Zhu <zhunanmcg...@gmail.com> wrote: > hmmmm… > > my observation is that, master in Spark 1.1 has higher frequency of GC…… > > Also, before 1.1, I never encounter GC overtime in Master, after upgrade > to 1.1, I have met for 2 times (we upgrade soon after 1.1 release)…. > > Best, > > -- > Nan Zhu > > On Thursday, October 23, 2014 at 1:08 PM, Andrew Or wrote: > > Yeah, as Sameer commented, there is unfortunately not an equivalent > `SPARK_MASTER_MEMORY` that you can set. You can work around this by > starting the master and the slaves separately with different settings of > SPARK_DAEMON_MEMORY each time. > > AFAIK there haven't been any major changes in the standalone master in > 1.1.0, so I don't see an immediate explanation for what you're observing. > In general the Spark master doesn't use that much memory, and even if there > are many applications it will discard the old ones appropriately, so unless > you have a ton (like thousands) of concurrently running applications > connecting to it there's little likelihood for it to OOM. At least that's > my understanding. > > -Andrew > > 2014-10-22 15:51 GMT-07:00 Sameer Farooqui <same...@databricks.com>: > > Hi Keith, > > Would be helpful if you could post the error message. > > Are you running Spark in Standalone mode or with YARN? > > In general, the Spark Master is only used for scheduling and it should be > fine with the default setting of 512 MB RAM. > > Is it actually the Spark Driver's memory that you intended to change? > > > > *++ If in Standalone mode ++* > You're right that SPARK_DAEMON_MEMORY set the memory to allocate to the > Spark Master, Worker and even HistoryServer daemons together. > > SPARK_WORKER_MEMORY is slightly confusing. In Standalone mode, it is the > amount of memory that a worker advertises as available for drivers to > launch executors. The sum of the memory used by executors spawned from a > worker cannot exceed SPARK_WORKER_MEMORY. > > Unfortunately, I'm not aware of a way to set the memory for Master and > Worker individually, other than launching them manually. You can also try > setting the config differently on each machine's spark-env.sh file. > > > *++ If in YARN mode ++* > In YARN, there is no setting for SPARK_DAEMON_MEMORY. Therefore this is > only in the Standalone documentation. > > Remember that in YARN mode there is no Spark Worker, instead the YARN > NodeManagers launches the Executors. And in YARN, there is no need to run a > Spark Master JVM (since the YARN ResourceManager takes care of the > scheduling). > > So, with YARN use SPARK_EXECUTOR_MEMORY to set the Executor's memory. And > use SPARK_DRIVER_MEMORY to set the Driver's memory. > > Just an FYI - for compatibility's sake, even in YARN mode there is a > setting for SPARK_WORKER_MEMORY, but this has been deprecated. If you do > set it, it just does the same thing as setting SPARK_EXECUTOR_MEMORY would > have done. > > > - Sameer > > > On Wed, Oct 22, 2014 at 1:46 PM, Keith Simmons <ke...@pulse.io> wrote: > > We've been getting some OOMs from the spark master since upgrading to > Spark 1.1.0. I've found SPARK_DAEMON_MEMORY, but that also seems to > increase the worker heap, which as far as I know is fine. Is there any > setting which *only* increases the master heap size? > > Keith > > > > >