I don’t think it’s just memory overhead. It might be better to use an execute 
with lesser heap space(30GB?). 46 GB would mean more data load into memory and 
more GC, which can cause issues.

Also, have you tried to persist data in any way? If so, then that might be 
causing an issue.

Lastly, I am not sure if your data has a skew and if that is forcing a lot of 
data to be on one executor node.

Sent from my Windows 10 phone

From: Rodrick Brown<mailto:rodr...@orchardplatform.com>
Sent: Friday, November 25, 2016 12:25 AM
To: Aniket Bhatnagar<mailto:aniket.bhatna...@gmail.com>
Cc: user<mailto:user@spark.apache.org>
Subject: Re: OS killing Executor due to high (possibly off heap) memory usage

Try setting spark.yarn.executor.memoryOverhead 10000

On Thu, Nov 24, 2016 at 11:16 AM, Aniket Bhatnagar 
<aniket.bhatna...@gmail.com<mailto:aniket.bhatna...@gmail.com>> wrote:
Hi Spark users

I am running a job that does join of a huge dataset (7 TB+) and the executors 
keep crashing randomly, eventually causing the job to crash. There are no out 
of memory exceptions in the log and looking at the dmesg output, it seems like 
the OS killed the JVM because of high memory usage. My suspicion is towards off 
heap usage of executor is causing this as I am limiting the on heap usage of 
executor to be 46 GB and each host running the executor has 60 GB of RAM. After 
the executor crashes, I can see that the external shuffle manager 
(org.apache.spark.network.server.TransportRequestHandler) logs a lot of channel 
closed exceptions in yarn node manager logs. This leads me to believe that 
something triggers out of memory during shuffle read. Is there a configuration 
to completely disable usage of off heap memory? I have tried setting 
spark.shuffle.io<http://spark.shuffle.io>.preferDirectBufs=false but the 
executor is still getting killed by the same error.

Cluster details:
10 AWS c4.8xlarge hosts
RAM on each host - 60 GB
Number of cores on each host - 36
Additional hard disk on each host - 8 TB

Spark configuration:
dynamic allocation enabled
external shuffle service enabled
spark.driver.memory 1024M
spark.executor.memory 47127M
Spark master yarn-cluster

Sample error in yarn node manager:
2016-11-24 10:34:06,507 ERROR 
org.apache.spark.network.server.TransportRequestHandler (shuffle-server-50): 
Error sending result 
ChunkFetchSuccess{streamChunkId=StreamChunkId{streamId=919299554123, 
chunkIndex=0}, 
buffer=FileSegmentManagedBuffer{file=/mnt3/yarn/usercache/hadoop/appcache/application_1479898345621_0006/blockmgr-ad5301a9-e1e9-4723-a8c4-9276971b2259/2c/shuffle_3_963_0.data,
 offset=0, length=669014456}} to 
/10.192.108.170:52782<http://10.192.108.170:52782>; closing connection
java.nio.channels.ClosedChannelException

Error in dmesg:
[799873.309897] Out of memory: Kill process 50001 (java) score 927 or sacrifice 
child
[799873.314439] Killed process 50001 (java) total-vm:65652448kB, 
anon-rss:57246528kB, file-rss:0kB

Thanks,
Aniket



--

[Orchard Platform]<http://www.orchardplatform.com/>

Rodrick Brown / DevOPs

9174456839 / rodr...@orchardplatform.com<mailto:rodr...@orchardplatform.com>

Orchard Platform
101 5th Avenue, 4th Floor, New York, NY

NOTICE TO RECIPIENTS: This communication is confidential and intended for the 
use of the addressee only. If you are not an intended recipient of this 
communication, please delete it immediately and notify the sender by return 
email. Unauthorized reading, dissemination, distribution or copying of this 
communication is prohibited. This communication does not constitute an offer to 
sell or a solicitation of an indication of interest to purchase any loan, 
security or any other financial product or instrument, nor is it an offer to 
sell or a solicitation of an indication of interest to purchase any products or 
services to any persons who are prohibited from receiving such information 
under applicable law. The contents of this communication may not be accurate or 
complete and are subject to change without notice. As such, Orchard App, Inc. 
(including its subsidiaries and affiliates, "Orchard") makes no representation 
regarding the accuracy or completeness of the information contained herein. The 
intended recipient is advised to consult its own professional advisors, 
including those specializing in legal, tax and accounting matters. Orchard does 
not provide legal, tax or accounting advice.

Reply via email to