subject:"Re\: Lost executors"

Re: Lost executors

2014-11-20 Thread Pala M Muthaia

Just to close the loop, it seems no issues pop up when i submit the job using 'spark submit' so that the driver process also runs on a container in the YARN cluster. In the above, the driver was running on the gateway machine through which the job was submitted, which led to quite a few issues. O

Re: Lost executors

2014-11-18 Thread Pala M Muthaia

Sandy, Good point - i forgot about NM logs. When i looked up the NM logs, i only see the following statements that align with the driver side log about lost executor. Many executors show the same log statement at the same time, so it seems like the decision to kill many if not all executors happe

Re: Lost executors

2014-11-18 Thread Sandy Ryza

Hi Pala, Do you have access to your YARN NodeManager logs? Are you able to check whether they report killing any containers for exceeding memory limits? -Sandy On Tue, Nov 18, 2014 at 1:54 PM, Pala M Muthaia wrote: > Hi, > > I am using Spark 1.0.1 on Yarn 2.5, and doing everything through spa

Re: Lost executors

2014-08-13 Thread Andrew Or

Hi Ravi, Setting SPARK_MEMORY doesn't do anything. I believe you confused it with SPARK_MEM, which is now deprecated. You should set SPARK_EXECUTOR_MEMORY instead, or "spark.executor.memory" as a config in conf/spark-defaults.conf. Assuming you haven't set the executor memory through a different m

Re: Lost executors

2014-08-13 Thread rpandya

I'm running Spark 1.0.1 with SPARK_MEMORY=60g, so 4 executors at that size would indeed run out of memory (the machine has 110GB). And in fact they would get repeatedly restarted and killed until eventually Spark gave up. I'll try with a smaller limit, but it'll be a while - somehow my HDFS got se

Re: Lost executors

2014-08-13 Thread Andrew Or

To add to the pile of information we're asking you to provide, what version of Spark are you running? 2014-08-13 11:11 GMT-07:00 Shivaram Venkataraman : > If the JVM heap size is close to the memory limit the OS sometimes kills > the process under memory pressure. I've usually found that lowerin

Re: Lost executors

2014-08-13 Thread Shivaram Venkataraman

If the JVM heap size is close to the memory limit the OS sometimes kills the process under memory pressure. I've usually found that lowering the executor memory size helps. Shivaram On Wed, Aug 13, 2014 at 11:01 AM, Matei Zaharia wrote: > What is your Spark executor memory set to? (You can see

Re: Lost executors

2014-08-13 Thread Matei Zaharia

What is your Spark executor memory set to? (You can see it in Spark's web UI at http://:4040 under the executors tab). One thing to be aware of is that the JVM never really releases memory back to the OS, so it will keep filling up to the maximum heap size you set. Maybe 4 executors with that mu

Re: Lost executors

2014-08-13 Thread rpandya

After a lot of grovelling through logs, I found out that the Nagios monitor process detected that the machine was almost out of memory, and killed the SNAP executor process. So why is the machine running out of memory? Each node has 128GB of RAM, 4 executors, about 40GB of data. It did run out of

Re: Lost executors

2014-08-08 Thread rpandya

Hi Avishek, I'm running on a manual cluster setup, and all the code is Scala. The load averages don't seem high when I see these failures (about 12 on a 16-core machine). Ravi -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Lost-executors-tp11722p11819.htm

Re: Lost executors

2014-08-08 Thread Avishek Saha

Same here Ravi. See my post on a similar thread. Are you running on YARN client? On Aug 7, 2014 2:56 PM, "rpandya" wrote: > I'm running into a problem with executors failing, and it's not clear > what's > causing it. Any suggestions on how to diagnose & fix it would be > appreciated. > > There a

Re: Lost executors

2014-07-23 Thread Eric Friedman

And... PEBCAK I mistakenly believed I had set PYSPARK_PYTHON to a python 2.7 install, but it was on a python 2.6 install on the remote nodes, hence incompatible with what the master was sending. Have set this to point to the correct version everywhere and it works. Apologies for the false alarm.

Re: Lost executors

2014-07-23 Thread Eric Friedman

hi Andrew, Thanks for your note. Yes, I see a stack trace now. It seems to be an issue with python interpreting a function I wish to apply to an RDD. The stack trace is below. The function is a simple factorial: def f(n): if n == 1: return 1 return n * f(n-1) and I'm trying to use it lik

Re: Lost executors

2014-07-23 Thread Andrew Or

Hi Eric, Have you checked the executor logs? It is possible they died because of some exception, and the message you see is just a side effect. Andrew 2014-07-23 8:27 GMT-07:00 Eric Friedman : > I'm using spark 1.0.1 on a quite large cluster, with gobs of memory, etc. > Cluster resources are

Re: Lost executors

Re: Lost executors

Re: Lost executors

Re: Lost executors

Re: Lost executors

Re: Lost executors

Re: Lost executors

Re: Lost executors

Re: Lost executors

Re: Lost executors

Re: Lost executors

Re: Lost executors

Re: Lost executors

Re: Lost executors

14 matches

Site Navigation

Mail list logo

Footer information