Just to close the loop, it seems no issues pop up when i submit the job
using 'spark submit' so that the driver process also runs on a container in
the YARN cluster.
In the above, the driver was running on the gateway machine through which
the job was submitted, which led to quite a few issues.
O
Sandy,
Good point - i forgot about NM logs.
When i looked up the NM logs, i only see the following statements that
align with the driver side log about lost executor. Many executors show the
same log statement at the same time, so it seems like the decision to kill
many if not all executors happe
Hi Pala,
Do you have access to your YARN NodeManager logs? Are you able to check
whether they report killing any containers for exceeding memory limits?
-Sandy
On Tue, Nov 18, 2014 at 1:54 PM, Pala M Muthaia wrote:
> Hi,
>
> I am using Spark 1.0.1 on Yarn 2.5, and doing everything through spa
Hi Ravi,
Setting SPARK_MEMORY doesn't do anything. I believe you confused it with
SPARK_MEM, which is now deprecated. You should set SPARK_EXECUTOR_MEMORY
instead, or "spark.executor.memory" as a config in
conf/spark-defaults.conf. Assuming you haven't set the executor memory
through a different m
I'm running Spark 1.0.1 with SPARK_MEMORY=60g, so 4 executors at that size
would indeed run out of memory (the machine has 110GB). And in fact they
would get repeatedly restarted and killed until eventually Spark gave up.
I'll try with a smaller limit, but it'll be a while - somehow my HDFS got
se
To add to the pile of information we're asking you to provide, what version
of Spark are you running?
2014-08-13 11:11 GMT-07:00 Shivaram Venkataraman :
> If the JVM heap size is close to the memory limit the OS sometimes kills
> the process under memory pressure. I've usually found that lowerin
If the JVM heap size is close to the memory limit the OS sometimes kills
the process under memory pressure. I've usually found that lowering the
executor memory size helps.
Shivaram
On Wed, Aug 13, 2014 at 11:01 AM, Matei Zaharia
wrote:
> What is your Spark executor memory set to? (You can see
What is your Spark executor memory set to? (You can see it in Spark's web UI at
http://:4040 under the executors tab). One thing to be aware of is that
the JVM never really releases memory back to the OS, so it will keep filling up
to the maximum heap size you set. Maybe 4 executors with that mu
After a lot of grovelling through logs, I found out that the Nagios monitor
process detected that the machine was almost out of memory, and killed the
SNAP executor process.
So why is the machine running out of memory? Each node has 128GB of RAM, 4
executors, about 40GB of data. It did run out of
Hi Avishek,
I'm running on a manual cluster setup, and all the code is Scala. The load
averages don't seem high when I see these failures (about 12 on a 16-core
machine).
Ravi
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Lost-executors-tp11722p11819.htm
Same here Ravi. See my post on a similar thread.
Are you running on YARN client?
On Aug 7, 2014 2:56 PM, "rpandya" wrote:
> I'm running into a problem with executors failing, and it's not clear
> what's
> causing it. Any suggestions on how to diagnose & fix it would be
> appreciated.
>
> There a
And... PEBCAK
I mistakenly believed I had set PYSPARK_PYTHON to a python 2.7 install, but
it was on a python 2.6 install on the remote nodes, hence incompatible with
what the master was sending. Have set this to point to the correct version
everywhere and it works.
Apologies for the false alarm.
hi Andrew,
Thanks for your note. Yes, I see a stack trace now. It seems to be an
issue with python interpreting a function I wish to apply to an RDD. The
stack trace is below. The function is a simple factorial:
def f(n):
if n == 1: return 1
return n * f(n-1)
and I'm trying to use it lik
Hi Eric,
Have you checked the executor logs? It is possible they died because of
some exception, and the message you see is just a side effect.
Andrew
2014-07-23 8:27 GMT-07:00 Eric Friedman :
> I'm using spark 1.0.1 on a quite large cluster, with gobs of memory, etc.
> Cluster resources are
14 matches
Mail list logo