Re: Yarn containers getting killed, error 52, multiple joins

Chen, Mingrui Thu, 13 Apr 2017 15:06:07 -0700

1.5TB is incredible high. It doesn't seem to be a configuration problem. Could 
you paste the code snippet doing the loop and join task on the dataset?

Best regards,

________________________________
From: rachmaninovquartet <rachmaninovquar...@gmail.com>
Sent: Thursday, April 13, 2017 10:08:40 AM
To: user@spark.apache.org
Subject: Yarn containers getting killed, error 52, multiple joins

Hi,

I have a spark 1.6.2 app (tested previously in 2.0.0 as well). It is
requiring a ton of memory (1.5TB) for a small dataset (~500mb). The memory
usage seems to jump, when I loop through and inner join to make the dataset
12 times as wide. The app goes down during or after this loop, when I try to
run a logistic regression on the generated dataframe. I'm using the scala
API (2.10). Dynamic resource allocation is configured. Here are the
parameters I'm using.

--master yarn-client --queue analyst --executor-cor    es 5
--executor-memory 40G --driver-memory 30G --conf spark.memory.fraction=0.75
--conf spark.yarn.executor.memoryOverhead=5120

Has anyone seen this or have an idea how to tune it? There is no way it
should need so much memory.

Thanks,

Ian

--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Yarn-containers-getting-killed-error-52-multiple-joins-tp28594.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Yarn containers getting killed, error 52, multiple joins

Reply via email to