Hi Arun, Based on logs you share it looks like hdfs issue:
https://issues.apache.org/jira/browse/HDFS-8475 Nirav On Thu, Feb 11, 2016 at 9:38 PM, <arun.bong...@cognizant.com> wrote: > Hi All, > > Even i have same issues. > > EMR conf is 3 node cluster with m3.2xlarge. > > i'm tyring to read 100Gb file in spark-sql > > i have set below on spark > > export SPARK_EXECUTOR_MEMORY=4G > export SPARK_DRIVER_MEMORY=12G > > export SPARK_EXECUTOR_INSTANCES=16 > export SPARK_EXECUTOR_CORES=16 > > spark.kryoserializer.buffer.max 2000m > spark.driver.maxResultSize 0 > > -XX:MaxPermSize=1024M > > > PFB the error: > > 16/02/11 15:32:00 WARN DFSClient: DFSOutputStream ResponseProcessor > exception for block > BP-1257713490-xx.xx.xx.xx-1455121562682:blk_1073742405_10984 > java.io.EOFException: Premature EOF: no length prefix available > at > org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2280) > at > org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:244) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:745) > > Kindly help me understand the conf. > > > Thanks in advance. > > Regards > Arun. > > ------------------------------ > *From:* Kuchekar [kuchekar.nil...@gmail.com] > *Sent:* 11 February 2016 09:42 > *To:* Nirav Patel > *Cc:* spark users > *Subject:* Re: Spark execuotr Memory profiling > > Hi Nirav, > > I faced similar issue with Yarn, EMR 1.5.2 and following > Spark Conf helped me. You can set the values accordingly > > conf= (SparkConf().set("spark.master","yarn-client").setAppName("HalfWay" > ).set("spark.driver.memory", "15G").set("spark.yarn.am.memory","15G")) > > conf=conf.set("spark.driver.maxResultSize","10G").set( > "spark.storage.memoryFraction","0.6").set("spark.shuffle.memoryFraction", > "0.6").set("spark.yarn.executor.memoryOverhead","4000") > > conf = conf.set("spark.executor.cores","4").set("spark.executor.memory", > "15G").set("spark.executor.instances","6") > > Is it also possible to use reduceBy in place of groupBy that might help > the shuffling too. > > > Kuchekar, Nilesh > > On Wed, Feb 10, 2016 at 8:09 PM, Nirav Patel <npa...@xactlycorp.com > <http://redir.aspx?REF=BodRWYcOP3Qpmfn0zqBA407ud6ZVgyPbPA0LvbEHT0gAdyPg_DLTCAFtYWlsdG86bnBhdGVsQHhhY3RseWNvcnAuY29t> > > wrote: > >> We have been trying to solve memory issue with a spark job that processes >> 150GB of data (on disk). It does a groupBy operation; some of the executor >> will receive somehwere around (2-4M scala case objects) to work with. We >> are using following spark config: >> >> "executorInstances": "15", >> >> "executorCores": "1", (we reduce it to one so single task gets all >> the executorMemory! at least that's the assumption here) >> >> "executorMemory": "15000m", >> >> "minPartitions": "2000", >> >> "taskCpus": "1", >> >> "executorMemoryOverhead": "1300", >> >> "shuffleManager": "tungsten-sort", >> >> "storageFraction": "0.4" >> >> >> This is a snippet of what we see in spark UI for a Job that fails. >> >> This is a *stage* of this job that fails. >> >> Stage Id Pool Name Description Submitted Duration Tasks: Succeeded/Total >> Input Output Shuffle Read ▾ Shuffle Write Failure Reason >> 5 (retry 15) prod >> <http://redir.aspx?REF=AnMVxvlyRKiif-yzc14LFZ4llYXEW8xW4Ga6t_62sPQAdyPg_DLTCAFodHRwOi8vaGRuNzoxODA4MC9oaXN0b3J5L2FwcGxpY2F0aW9uXzE0NTQ5NzU4MDAxOTJfMDQ0Ny9zdGFnZXMvcG9vbD9wb29sbmFtZT1wcm9k> >> map >> at SparkDataJobs.scala:210 >> <http://redir.aspx?REF=iWnHMjNIESHfSm3y5RxgJPmcc71pheAifrzx6REpZ9QAdyPg_DLTCAFodHRwOi8vaGRuNzoxODA4MC9oaXN0b3J5L2FwcGxpY2F0aW9uXzE0NTQ5NzU4MDAxOTJfMDQ0Ny9zdGFnZXMvc3RhZ2U_aWQ9NSZhdHRlbXB0PTE1> >> +details >> >> 2016/02/09 21:30:06 13 min >> 130/389 (16 failed) >> 1982.6 MB 818.7 MB org.apache.spark.shuffle.FetchFailedException: Error >> in opening >> FileSegmentManagedBuffer{file=/tmp/hadoop/nm-local-dir/usercache/fasd/appcache/application_1454975800192_0447/blockmgr-abb77b52-9761-457a-b67d-42a15b975d76/0c/shuffle_0_39_0.data, >> offset=11421300, length=2353} >> >> This is one of the single *task* attempt from above stage that threw OOM >> >> 2 22361 0 FAILED PROCESS_LOCAL 38 / nd1.mycom.local 2016/02/09 22:10:42 5.2 >> min 1.6 min 7.4 MB / 375509 java.lang.OutOfMemoryError: Java heap space >> +details >> >> java.lang.OutOfMemoryError: Java heap space >> at java.util.IdentityHashMap.resize(IdentityHashMap.java:469) >> at java.util.IdentityHashMap.put(IdentityHashMap.java:445) >> at >> org.apache.spark.util.SizeEstimator$SearchState.enqueue(SizeEstimator.scala:159) >> at >> org.apache.spark.util.SizeEstimator$$anonfun$visitSingleObject$1.apply(SizeEstimator.scala:203) >> at >> org.apache.spark.util.SizeEstimator$$anonfun$visitSingleObject$1.apply(SizeEstimator.scala:202) >> at scala.collection.immutable.List.foreach(List.scala:318) >> at >> org.apache.spark.util.SizeEstimator$.visitSingleObject(SizeEstimator.scala:202) >> at >> org.apache.spark.util.SizeEstimator$.org$apache$spark$util$SizeEstimator$$estimate(SizeEstimator.scala:186) >> at org.apache.spark.util.SizeEstimator$.estimate(SizeEstimator.scala:54) >> at >> org.apache.spark.util.collection.SizeTracker$class.takeSample(SizeTracker.scala:78) >> at >> org.apache.spark.util.collection.SizeTracker$class.afterUpdate(SizeTracker.scala:70) >> at >> org.apache.spark.util.collection.SizeTrackingVector.$plus$eq(SizeTrackingVector.scala:3 >> >> >> None of above suggest that it went out ot 15GB of memory that I initially >> allocated? So what am i missing here. What's eating my memory. >> >> We tried executorJavaOpts to get heap dump but it doesn't seem to work. >> >> -XX:-HeapDumpOnOutOfMemoryError -XX:OnOutOfMemoryError='kill -3 %p' >> -XX:HeapDumpPath=/opt/cores/spark >> >> I don't see any cores being generated.. neither I can find Heap dump >> anywhere in logs. >> >> Also, how do I find yarn container ID from spark executor ID ? So that I >> can investigate yarn nodemanager and resourcemanager logs for particular >> container. >> >> PS - Job does not do any caching of intermediate RDD as each RDD is just >> used once for subsequent step. We use spark 1.5.2 over Yarn in yarn-client >> mode. >> >> >> Thanks >> >> >> >> >> >> >> [image: What's New with Xactly] >> <http://redir.aspx?REF=ntKygyqq2xglSpOZ-_S3UO5ijxjWngbLGIx8zaIX5J1g2CXg_DLTCAFodHRwOi8vd3d3LnhhY3RseWNvcnAuY29tL2VtYWlsLWNsaWNrLw..> >> >> >> <http://redir.aspx?REF=XxqSciJ4MrIo84C2PUGO24k26-2w0PvPU2959_AdXR9g2CXg_DLTCAFodHRwczovL3d3dy5ueXNlLmNvbS9xdW90ZS9YTllTOlhUTFk.> >> [image: LinkedIn] >> <http://redir.aspx?REF=bVtSSTrn15-ccM-NAs45ecVuflV10Fx5fh5Ou10LtH5g2CXg_DLTCAFodHRwczovL3d3dy5saW5rZWRpbi5jb20vY29tcGFueS94YWN0bHktY29ycG9yYXRpb24.> >> [image: Twitter] >> <http://redir.aspx?REF=hRKyztX7dMNmYnYk_eQMpZ6X0aEb6w_Wo2WqaSpPC3pg2CXg_DLTCAFodHRwczovL3R3aXR0ZXIuY29tL1hhY3RseQ..> >> [image: Facebook] >> <http://redir.aspx?REF=OJPtuwme12CQ7d8UKuTgAQVv9R_cBseOrK7pahWXXkBg2CXg_DLTCAFodHRwczovL3d3dy5mYWNlYm9vay5jb20vWGFjdGx5Q29ycA..> >> [image: YouTube] >> <http://redir.aspx?REF=0u4D75j-RVcW_DSAE-gc_4P-J2echiygf3eOWGADkpRg2CXg_DLTCAFodHRwOi8vd3d3LnlvdXR1YmUuY29tL3hhY3RseWNvcnBvcmF0aW9u> > > > This e-mail and any files transmitted with it are for the sole use of the > intended recipient(s) and may contain confidential and privileged > information. If you are not the intended recipient(s), please reply to the > sender and destroy all copies of the original message. Any unauthorized > review, use, disclosure, dissemination, forwarding, printing or copying of > this email, and/or any action taken in reliance on the contents of this > e-mail is strictly prohibited and may be unlawful. Where permitted by > applicable law, this e-mail and other e-mail communications sent to and > from Cognizant e-mail addresses may be monitored. > -- [image: What's New with Xactly] <http://www.xactlycorp.com/email-click/> <https://www.nyse.com/quote/XNYS:XTLY> [image: LinkedIn] <https://www.linkedin.com/company/xactly-corporation> [image: Twitter] <https://twitter.com/Xactly> [image: Facebook] <https://www.facebook.com/XactlyCorp> [image: YouTube] <http://www.youtube.com/xactlycorporation>