Hi All,

Even i have same issues.

EMR conf is 3 node cluster with m3.2xlarge.

i'm tyring to read 100Gb file in spark-sql

i have set below on spark

export SPARK_EXECUTOR_MEMORY=4G
export SPARK_DRIVER_MEMORY=12G

export SPARK_EXECUTOR_INSTANCES=16
export SPARK_EXECUTOR_CORES=16

spark.kryoserializer.buffer.max 2000m
spark.driver.maxResultSize 0

 -XX:MaxPermSize=1024M


PFB the error:

16/02/11 15:32:00 WARN DFSClient: DFSOutputStream ResponseProcessor exception  
for block BP-1257713490-xx.xx.xx.xx-1455121562682:blk_1073742405_10984
java.io.EOFException: Premature EOF: no length prefix available
        at 
org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2280)
        at 
org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:244)
        at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:745)

Kindly help me understand the conf.


Thanks in advance.

Regards
Arun.

________________________________
From: Kuchekar [kuchekar.nil...@gmail.com]
Sent: 11 February 2016 09:42
To: Nirav Patel
Cc: spark users
Subject: Re: Spark execuotr Memory profiling

Hi Nirav,

                  I faced similar issue with Yarn, EMR 1.5.2 and following 
Spark Conf helped me. You can set the values accordingly

conf= 
(SparkConf().set("spark.master","yarn-client").setAppName("HalfWay").set("spark.driver.memory",
 "15G").set("spark.yarn.am.memory","15G"))

conf=conf.set("spark.driver.maxResultSize","10G").set("spark.storage.memoryFraction","0.6").set("spark.shuffle.memoryFraction","0.6").set("spark.yarn.executor.memoryOverhead","4000")

conf = conf.set("spark.executor.cores","4").set("spark.executor.memory", 
"15G").set("spark.executor.instances","6")

Is it also possible to use reduceBy in place of groupBy that might help the 
shuffling too.


Kuchekar, Nilesh

On Wed, Feb 10, 2016 at 8:09 PM, Nirav Patel 
<npa...@xactlycorp.com<redir.aspx?REF=BodRWYcOP3Qpmfn0zqBA407ud6ZVgyPbPA0LvbEHT0gAdyPg_DLTCAFtYWlsdG86bnBhdGVsQHhhY3RseWNvcnAuY29t>>
 wrote:
We have been trying to solve memory issue with a spark job that processes 150GB 
of data (on disk). It does a groupBy operation; some of the executor will 
receive somehwere around (2-4M scala case objects) to work with. We are using 
following spark config:


"executorInstances": "15",

     "executorCores": "1", (we reduce it to one so single task gets all the 
executorMemory! at least that's the assumption here)

     "executorMemory": "15000m",

     "minPartitions": "2000",

     "taskCpus": "1",

     "executorMemoryOverhead": "1300",

     "shuffleManager": "tungsten-sort",

      "storageFraction": "0.4"


This is a snippet of what we see in spark UI for a Job that fails.

This is a stage of this job that fails.

Stage Id        Pool Name       Description     Submitted       Duration        
Tasks: Succeeded/Total  Input   Output  Shuffle Read ▾  Shuffle Write   Failure 
Reason
5 (retry 15)    
prod<redir.aspx?REF=AnMVxvlyRKiif-yzc14LFZ4llYXEW8xW4Ga6t_62sPQAdyPg_DLTCAFodHRwOi8vaGRuNzoxODA4MC9oaXN0b3J5L2FwcGxpY2F0aW9uXzE0NTQ5NzU4MDAxOTJfMDQ0Ny9zdGFnZXMvcG9vbD9wb29sbmFtZT1wcm9k>
       map at 
SparkDataJobs.scala:210<redir.aspx?REF=iWnHMjNIESHfSm3y5RxgJPmcc71pheAifrzx6REpZ9QAdyPg_DLTCAFodHRwOi8vaGRuNzoxODA4MC9oaXN0b3J5L2FwcGxpY2F0aW9uXzE0NTQ5NzU4MDAxOTJfMDQ0Ny9zdGFnZXMvc3RhZ2U_aWQ9NSZhdHRlbXB0PTE1>+details
        2016/02/09 21:30:06     13 min

130/389 (16 failed)
                        1982.6 MB       818.7 MB        
org.apache.spark.shuffle.FetchFailedException: Error in opening 
FileSegmentManagedBuffer{file=/tmp/hadoop/nm-local-dir/usercache/fasd/appcache/application_1454975800192_0447/blockmgr-abb77b52-9761-457a-b67d-42a15b975d76/0c/shuffle_0_39_0.data,
 offset=11421300, length=2353}

This is one of the single task attempt from above stage that threw OOM

2       22361   0       FAILED  PROCESS_LOCAL   38 / nd1.mycom.local    
2016/02/09 22:10:42     5.2 min 1.6 min 7.4 MB / 375509 
java.lang.OutOfMemoryError: Java heap space+details

java.lang.OutOfMemoryError: Java heap space
        at java.util.IdentityHashMap.resize(IdentityHashMap.java:469)
        at java.util.IdentityHashMap.put(IdentityHashMap.java:445)
        at 
org.apache.spark.util.SizeEstimator$SearchState.enqueue(SizeEstimator.scala:159)
        at 
org.apache.spark.util.SizeEstimator$$anonfun$visitSingleObject$1.apply(SizeEstimator.scala:203)
        at 
org.apache.spark.util.SizeEstimator$$anonfun$visitSingleObject$1.apply(SizeEstimator.scala:202)
        at scala.collection.immutable.List.foreach(List.scala:318)
        at 
org.apache.spark.util.SizeEstimator$.visitSingleObject(SizeEstimator.scala:202)
        at 
org.apache.spark.util.SizeEstimator$.org$apache$spark$util$SizeEstimator$$estimate(SizeEstimator.scala:186)
        at org.apache.spark.util.SizeEstimator$.estimate(SizeEstimator.scala:54)
        at 
org.apache.spark.util.collection.SizeTracker$class.takeSample(SizeTracker.scala:78)
        at 
org.apache.spark.util.collection.SizeTracker$class.afterUpdate(SizeTracker.scala:70)
        at 
org.apache.spark.util.collection.SizeTrackingVector.$plus$eq(SizeTrackingVector.scala:3



None of above suggest that it went out ot 15GB of memory that I initially 
allocated? So what am i missing here. What's eating my memory.

We tried executorJavaOpts to get heap dump but it doesn't seem to work.

-XX:-HeapDumpOnOutOfMemoryError -XX:OnOutOfMemoryError='kill -3 %p' 
-XX:HeapDumpPath=/opt/cores/spark

I don't see any cores being generated.. neither I can find Heap dump anywhere 
in logs.

Also, how do I find yarn container ID from spark executor ID ? So that I can 
investigate yarn nodemanager and resourcemanager logs for particular container.

PS - Job does not do any caching of intermediate RDD as each RDD is just used 
once for subsequent step. We use spark 1.5.2 over Yarn in yarn-client mode.


Thanks





[What's New with 
Xactly]<redir.aspx?REF=ntKygyqq2xglSpOZ-_S3UO5ijxjWngbLGIx8zaIX5J1g2CXg_DLTCAFodHRwOi8vd3d3LnhhY3RseWNvcnAuY29tL2VtYWlsLWNsaWNrLw..>

[https://www.xactlycorp.com/wp-content/uploads/2015/07/nyse_xtly_alt_24.png]<redir.aspx?REF=XxqSciJ4MrIo84C2PUGO24k26-2w0PvPU2959_AdXR9g2CXg_DLTCAFodHRwczovL3d3dy5ueXNlLmNvbS9xdW90ZS9YTllTOlhUTFk.>
  [LinkedIn] 
<redir.aspx?REF=bVtSSTrn15-ccM-NAs45ecVuflV10Fx5fh5Ou10LtH5g2CXg_DLTCAFodHRwczovL3d3dy5saW5rZWRpbi5jb20vY29tcGFueS94YWN0bHktY29ycG9yYXRpb24.>
   [Twitter] 
<redir.aspx?REF=hRKyztX7dMNmYnYk_eQMpZ6X0aEb6w_Wo2WqaSpPC3pg2CXg_DLTCAFodHRwczovL3R3aXR0ZXIuY29tL1hhY3RseQ..>
   [Facebook] 
<redir.aspx?REF=OJPtuwme12CQ7d8UKuTgAQVv9R_cBseOrK7pahWXXkBg2CXg_DLTCAFodHRwczovL3d3dy5mYWNlYm9vay5jb20vWGFjdGx5Q29ycA..>
   [YouTube] 
<redir.aspx?REF=0u4D75j-RVcW_DSAE-gc_4P-J2echiygf3eOWGADkpRg2CXg_DLTCAFodHRwOi8vd3d3LnlvdXR1YmUuY29tL3hhY3RseWNvcnBvcmF0aW9u>

This e-mail and any files transmitted with it are for the sole use of the 
intended recipient(s) and may contain confidential and privileged information. 
If you are not the intended recipient(s), please reply to the sender and 
destroy all copies of the original message. Any unauthorized review, use, 
disclosure, dissemination, forwarding, printing or copying of this email, 
and/or any action taken in reliance on the contents of this e-mail is strictly 
prohibited and may be unlawful. Where permitted by applicable law, this e-mail 
and other e-mail communications sent to and from Cognizant e-mail addresses may 
be monitored.

Reply via email to