Hi All, Even i have same issues.
EMR conf is 3 node cluster with m3.2xlarge. i'm tyring to read 100Gb file in spark-sql i have set below on spark export SPARK_EXECUTOR_MEMORY=4G export SPARK_DRIVER_MEMORY=12G export SPARK_EXECUTOR_INSTANCES=16 export SPARK_EXECUTOR_CORES=16 spark.kryoserializer.buffer.max 2000m spark.driver.maxResultSize 0 -XX:MaxPermSize=1024M PFB the error: 16/02/11 15:32:00 WARN DFSClient: DFSOutputStream ResponseProcessor exception for block BP-1257713490-xx.xx.xx.xx-1455121562682:blk_1073742405_10984 java.io.EOFException: Premature EOF: no length prefix available at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2280) at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:244) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:745) Kindly help me understand the conf. Thanks in advance. Regards Arun. ________________________________ From: Kuchekar [kuchekar.nil...@gmail.com] Sent: 11 February 2016 09:42 To: Nirav Patel Cc: spark users Subject: Re: Spark execuotr Memory profiling Hi Nirav, I faced similar issue with Yarn, EMR 1.5.2 and following Spark Conf helped me. You can set the values accordingly conf= (SparkConf().set("spark.master","yarn-client").setAppName("HalfWay").set("spark.driver.memory", "15G").set("spark.yarn.am.memory","15G")) conf=conf.set("spark.driver.maxResultSize","10G").set("spark.storage.memoryFraction","0.6").set("spark.shuffle.memoryFraction","0.6").set("spark.yarn.executor.memoryOverhead","4000") conf = conf.set("spark.executor.cores","4").set("spark.executor.memory", "15G").set("spark.executor.instances","6") Is it also possible to use reduceBy in place of groupBy that might help the shuffling too. Kuchekar, Nilesh On Wed, Feb 10, 2016 at 8:09 PM, Nirav Patel <npa...@xactlycorp.com<redir.aspx?REF=BodRWYcOP3Qpmfn0zqBA407ud6ZVgyPbPA0LvbEHT0gAdyPg_DLTCAFtYWlsdG86bnBhdGVsQHhhY3RseWNvcnAuY29t>> wrote: We have been trying to solve memory issue with a spark job that processes 150GB of data (on disk). It does a groupBy operation; some of the executor will receive somehwere around (2-4M scala case objects) to work with. We are using following spark config: "executorInstances": "15", "executorCores": "1", (we reduce it to one so single task gets all the executorMemory! at least that's the assumption here) "executorMemory": "15000m", "minPartitions": "2000", "taskCpus": "1", "executorMemoryOverhead": "1300", "shuffleManager": "tungsten-sort", "storageFraction": "0.4" This is a snippet of what we see in spark UI for a Job that fails. This is a stage of this job that fails. Stage Id Pool Name Description Submitted Duration Tasks: Succeeded/Total Input Output Shuffle Read ▾ Shuffle Write Failure Reason 5 (retry 15) prod<redir.aspx?REF=AnMVxvlyRKiif-yzc14LFZ4llYXEW8xW4Ga6t_62sPQAdyPg_DLTCAFodHRwOi8vaGRuNzoxODA4MC9oaXN0b3J5L2FwcGxpY2F0aW9uXzE0NTQ5NzU4MDAxOTJfMDQ0Ny9zdGFnZXMvcG9vbD9wb29sbmFtZT1wcm9k> map at SparkDataJobs.scala:210<redir.aspx?REF=iWnHMjNIESHfSm3y5RxgJPmcc71pheAifrzx6REpZ9QAdyPg_DLTCAFodHRwOi8vaGRuNzoxODA4MC9oaXN0b3J5L2FwcGxpY2F0aW9uXzE0NTQ5NzU4MDAxOTJfMDQ0Ny9zdGFnZXMvc3RhZ2U_aWQ9NSZhdHRlbXB0PTE1>+details 2016/02/09 21:30:06 13 min 130/389 (16 failed) 1982.6 MB 818.7 MB org.apache.spark.shuffle.FetchFailedException: Error in opening FileSegmentManagedBuffer{file=/tmp/hadoop/nm-local-dir/usercache/fasd/appcache/application_1454975800192_0447/blockmgr-abb77b52-9761-457a-b67d-42a15b975d76/0c/shuffle_0_39_0.data, offset=11421300, length=2353} This is one of the single task attempt from above stage that threw OOM 2 22361 0 FAILED PROCESS_LOCAL 38 / nd1.mycom.local 2016/02/09 22:10:42 5.2 min 1.6 min 7.4 MB / 375509 java.lang.OutOfMemoryError: Java heap space+details java.lang.OutOfMemoryError: Java heap space at java.util.IdentityHashMap.resize(IdentityHashMap.java:469) at java.util.IdentityHashMap.put(IdentityHashMap.java:445) at org.apache.spark.util.SizeEstimator$SearchState.enqueue(SizeEstimator.scala:159) at org.apache.spark.util.SizeEstimator$$anonfun$visitSingleObject$1.apply(SizeEstimator.scala:203) at org.apache.spark.util.SizeEstimator$$anonfun$visitSingleObject$1.apply(SizeEstimator.scala:202) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.util.SizeEstimator$.visitSingleObject(SizeEstimator.scala:202) at org.apache.spark.util.SizeEstimator$.org$apache$spark$util$SizeEstimator$$estimate(SizeEstimator.scala:186) at org.apache.spark.util.SizeEstimator$.estimate(SizeEstimator.scala:54) at org.apache.spark.util.collection.SizeTracker$class.takeSample(SizeTracker.scala:78) at org.apache.spark.util.collection.SizeTracker$class.afterUpdate(SizeTracker.scala:70) at org.apache.spark.util.collection.SizeTrackingVector.$plus$eq(SizeTrackingVector.scala:3 None of above suggest that it went out ot 15GB of memory that I initially allocated? So what am i missing here. What's eating my memory. We tried executorJavaOpts to get heap dump but it doesn't seem to work. -XX:-HeapDumpOnOutOfMemoryError -XX:OnOutOfMemoryError='kill -3 %p' -XX:HeapDumpPath=/opt/cores/spark I don't see any cores being generated.. neither I can find Heap dump anywhere in logs. Also, how do I find yarn container ID from spark executor ID ? So that I can investigate yarn nodemanager and resourcemanager logs for particular container. PS - Job does not do any caching of intermediate RDD as each RDD is just used once for subsequent step. We use spark 1.5.2 over Yarn in yarn-client mode. Thanks [What's New with Xactly]<redir.aspx?REF=ntKygyqq2xglSpOZ-_S3UO5ijxjWngbLGIx8zaIX5J1g2CXg_DLTCAFodHRwOi8vd3d3LnhhY3RseWNvcnAuY29tL2VtYWlsLWNsaWNrLw..> [https://www.xactlycorp.com/wp-content/uploads/2015/07/nyse_xtly_alt_24.png]<redir.aspx?REF=XxqSciJ4MrIo84C2PUGO24k26-2w0PvPU2959_AdXR9g2CXg_DLTCAFodHRwczovL3d3dy5ueXNlLmNvbS9xdW90ZS9YTllTOlhUTFk.> [LinkedIn] <redir.aspx?REF=bVtSSTrn15-ccM-NAs45ecVuflV10Fx5fh5Ou10LtH5g2CXg_DLTCAFodHRwczovL3d3dy5saW5rZWRpbi5jb20vY29tcGFueS94YWN0bHktY29ycG9yYXRpb24.> [Twitter] <redir.aspx?REF=hRKyztX7dMNmYnYk_eQMpZ6X0aEb6w_Wo2WqaSpPC3pg2CXg_DLTCAFodHRwczovL3R3aXR0ZXIuY29tL1hhY3RseQ..> [Facebook] <redir.aspx?REF=OJPtuwme12CQ7d8UKuTgAQVv9R_cBseOrK7pahWXXkBg2CXg_DLTCAFodHRwczovL3d3dy5mYWNlYm9vay5jb20vWGFjdGx5Q29ycA..> [YouTube] <redir.aspx?REF=0u4D75j-RVcW_DSAE-gc_4P-J2echiygf3eOWGADkpRg2CXg_DLTCAFodHRwOi8vd3d3LnlvdXR1YmUuY29tL3hhY3RseWNvcnBvcmF0aW9u> This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.