Re: Lost executors

2014-11-20 Thread Pala M Muthaia
Just to close the loop, it seems no issues pop up when i submit the job using 'spark submit' so that the driver process also runs on a container in the YARN cluster. In the above, the driver was running on the gateway machine through which the job was submitted, which led to quite a few issues. O

Re: Lost executors

2014-11-18 Thread Pala M Muthaia
Sandy, Good point - i forgot about NM logs. When i looked up the NM logs, i only see the following statements that align with the driver side log about lost executor. Many executors show the same log statement at the same time, so it seems like the decision to kill many if not all executors happe

Re: Lost executors

2014-11-18 Thread Sandy Ryza
Hi Pala, Do you have access to your YARN NodeManager logs? Are you able to check whether they report killing any containers for exceeding memory limits? -Sandy On Tue, Nov 18, 2014 at 1:54 PM, Pala M Muthaia wrote: > Hi, > > I am using Spark 1.0.1 on Yarn 2.5, and doing everything through spa

Re: Lost executors

2014-08-13 Thread Andrew Or
Hi Ravi, Setting SPARK_MEMORY doesn't do anything. I believe you confused it with SPARK_MEM, which is now deprecated. You should set SPARK_EXECUTOR_MEMORY instead, or "spark.executor.memory" as a config in conf/spark-defaults.conf. Assuming you haven't set the executor memory through a different m

Re: Lost executors

2014-08-13 Thread rpandya
I'm running Spark 1.0.1 with SPARK_MEMORY=60g, so 4 executors at that size would indeed run out of memory (the machine has 110GB). And in fact they would get repeatedly restarted and killed until eventually Spark gave up. I'll try with a smaller limit, but it'll be a while - somehow my HDFS got se

Re: Lost executors

2014-08-13 Thread Andrew Or
To add to the pile of information we're asking you to provide, what version of Spark are you running? 2014-08-13 11:11 GMT-07:00 Shivaram Venkataraman : > If the JVM heap size is close to the memory limit the OS sometimes kills > the process under memory pressure. I've usually found that lowerin

Re: Lost executors

2014-08-13 Thread Shivaram Venkataraman
If the JVM heap size is close to the memory limit the OS sometimes kills the process under memory pressure. I've usually found that lowering the executor memory size helps. Shivaram On Wed, Aug 13, 2014 at 11:01 AM, Matei Zaharia wrote: > What is your Spark executor memory set to? (You can see

Re: Lost executors

2014-08-13 Thread Matei Zaharia
What is your Spark executor memory set to? (You can see it in Spark's web UI at http://:4040 under the executors tab). One thing to be aware of is that the JVM never really releases memory back to the OS, so it will keep filling up to the maximum heap size you set. Maybe 4 executors with that mu

Re: Lost executors

2014-08-13 Thread rpandya
After a lot of grovelling through logs, I found out that the Nagios monitor process detected that the machine was almost out of memory, and killed the SNAP executor process. So why is the machine running out of memory? Each node has 128GB of RAM, 4 executors, about 40GB of data. It did run out of

Re: Lost executors

2014-08-08 Thread rpandya
Hi Avishek, I'm running on a manual cluster setup, and all the code is Scala. The load averages don't seem high when I see these failures (about 12 on a 16-core machine). Ravi -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Lost-executors-tp11722p11819.htm

Re: Lost executors

2014-08-08 Thread Avishek Saha
Same here Ravi. See my post on a similar thread. Are you running on YARN client? On Aug 7, 2014 2:56 PM, "rpandya" wrote: > I'm running into a problem with executors failing, and it's not clear > what's > causing it. Any suggestions on how to diagnose & fix it would be > appreciated. > > There a

Re: Lost executors

2014-07-23 Thread Eric Friedman
And... PEBCAK I mistakenly believed I had set PYSPARK_PYTHON to a python 2.7 install, but it was on a python 2.6 install on the remote nodes, hence incompatible with what the master was sending. Have set this to point to the correct version everywhere and it works. Apologies for the false alarm.

Re: Lost executors

2014-07-23 Thread Eric Friedman
hi Andrew, Thanks for your note. Yes, I see a stack trace now. It seems to be an issue with python interpreting a function I wish to apply to an RDD. The stack trace is below. The function is a simple factorial: def f(n): if n == 1: return 1 return n * f(n-1) and I'm trying to use it lik

Re: Lost executors

2014-07-23 Thread Andrew Or
Hi Eric, Have you checked the executor logs? It is possible they died because of some exception, and the message you see is just a side effect. Andrew 2014-07-23 8:27 GMT-07:00 Eric Friedman : > I'm using spark 1.0.1 on a quite large cluster, with gobs of memory, etc. > Cluster resources are