Re: Spark execuotr Memory profiling

Nirav Patel Sat, 20 Feb 2016 08:39:26 -0800

Hi Arun,

Based on logs you share it looks like hdfs issue:


https://issues.apache.org/jira/browse/HDFS-8475

Nirav

On Thu, Feb 11, 2016 at 9:38 PM, <arun.bong...@cognizant.com> wrote:

> Hi All,
>
> Even i have same issues.
>
> EMR conf is 3 node cluster with m3.2xlarge.
>
> i'm tyring to read 100Gb file in spark-sql
>
> i have set below on spark
>
> export SPARK_EXECUTOR_MEMORY=4G
> export SPARK_DRIVER_MEMORY=12G
>
> export SPARK_EXECUTOR_INSTANCES=16
> export SPARK_EXECUTOR_CORES=16
>
> spark.kryoserializer.buffer.max 2000m
> spark.driver.maxResultSize 0
>
>  -XX:MaxPermSize=1024M
>
>
> PFB the error:
>
> 16/02/11 15:32:00 WARN DFSClient: DFSOutputStream ResponseProcessor
> exception  for block
> BP-1257713490-xx.xx.xx.xx-1455121562682:blk_1073742405_10984
> java.io.EOFException: Premature EOF: no length prefix available
>         at
> org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2280)
>         at
> org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:244)
>         at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:745)
>
> Kindly help me understand the conf.
>
>
> Thanks in advance.
>
> Regards
> Arun.
>
> ------------------------------
> *From:* Kuchekar [kuchekar.nil...@gmail.com]
> *Sent:* 11 February 2016 09:42
> *To:* Nirav Patel
> *Cc:* spark users
> *Subject:* Re: Spark execuotr Memory profiling
>
> Hi Nirav,
>
>                   I faced similar issue with Yarn, EMR 1.5.2 and following
> Spark Conf helped me. You can set the values accordingly
>
> conf= (SparkConf().set("spark.master","yarn-client").setAppName("HalfWay"
> ).set("spark.driver.memory", "15G").set("spark.yarn.am.memory","15G"))
>
> conf=conf.set("spark.driver.maxResultSize","10G").set(
> "spark.storage.memoryFraction","0.6").set("spark.shuffle.memoryFraction",
> "0.6").set("spark.yarn.executor.memoryOverhead","4000")
>
> conf = conf.set("spark.executor.cores","4").set("spark.executor.memory",
> "15G").set("spark.executor.instances","6")
>
> Is it also possible to use reduceBy in place of groupBy that might help
> the shuffling too.
>
>
> Kuchekar, Nilesh
>
> On Wed, Feb 10, 2016 at 8:09 PM, Nirav Patel <npa...@xactlycorp.com
> <http://redir.aspx?REF=BodRWYcOP3Qpmfn0zqBA407ud6ZVgyPbPA0LvbEHT0gAdyPg_DLTCAFtYWlsdG86bnBhdGVsQHhhY3RseWNvcnAuY29t>
> > wrote:
>
>> We have been trying to solve memory issue with a spark job that processes
>> 150GB of data (on disk). It does a groupBy operation; some of the executor
>> will receive somehwere around (2-4M scala case objects) to work with. We
>> are using following spark config:
>>
>> "executorInstances": "15",
>>
>>      "executorCores": "1", (we reduce it to one so single task gets all
>> the executorMemory! at least that's the assumption here)
>>
>>      "executorMemory": "15000m",
>>
>>      "minPartitions": "2000",
>>
>>      "taskCpus": "1",
>>
>>      "executorMemoryOverhead": "1300",
>>
>>      "shuffleManager": "tungsten-sort",
>>
>>       "storageFraction": "0.4"
>>
>>
>> This is a snippet of what we see in spark UI for a Job that fails.
>>
>> This is a *stage* of this job that fails.
>>
>> Stage Id Pool Name Description Submitted Duration Tasks: Succeeded/Total
>> Input Output Shuffle Read ▾ Shuffle Write Failure Reason
>> 5 (retry 15) prod
>> <http://redir.aspx?REF=AnMVxvlyRKiif-yzc14LFZ4llYXEW8xW4Ga6t_62sPQAdyPg_DLTCAFodHRwOi8vaGRuNzoxODA4MC9oaXN0b3J5L2FwcGxpY2F0aW9uXzE0NTQ5NzU4MDAxOTJfMDQ0Ny9zdGFnZXMvcG9vbD9wb29sbmFtZT1wcm9k>
>>  map
>> at SparkDataJobs.scala:210
>> <http://redir.aspx?REF=iWnHMjNIESHfSm3y5RxgJPmcc71pheAifrzx6REpZ9QAdyPg_DLTCAFodHRwOi8vaGRuNzoxODA4MC9oaXN0b3J5L2FwcGxpY2F0aW9uXzE0NTQ5NzU4MDAxOTJfMDQ0Ny9zdGFnZXMvc3RhZ2U_aWQ9NSZhdHRlbXB0PTE1>
>> +details
>>
>> 2016/02/09 21:30:06 13 min
>> 130/389 (16 failed)
>> 1982.6 MB 818.7 MB org.apache.spark.shuffle.FetchFailedException: Error
>> in opening
>> FileSegmentManagedBuffer{file=/tmp/hadoop/nm-local-dir/usercache/fasd/appcache/application_1454975800192_0447/blockmgr-abb77b52-9761-457a-b67d-42a15b975d76/0c/shuffle_0_39_0.data,
>> offset=11421300, length=2353}
>>
>> This is one of the single *task* attempt from above stage that threw OOM
>>
>> 2 22361 0 FAILED PROCESS_LOCAL 38 / nd1.mycom.local 2016/02/09 22:10:42 5.2
>> min 1.6 min 7.4 MB / 375509 java.lang.OutOfMemoryError: Java heap space
>> +details
>>
>> java.lang.OutOfMemoryError: Java heap space
>>      at java.util.IdentityHashMap.resize(IdentityHashMap.java:469)
>>      at java.util.IdentityHashMap.put(IdentityHashMap.java:445)
>>      at 
>> org.apache.spark.util.SizeEstimator$SearchState.enqueue(SizeEstimator.scala:159)
>>      at 
>> org.apache.spark.util.SizeEstimator$$anonfun$visitSingleObject$1.apply(SizeEstimator.scala:203)
>>      at 
>> org.apache.spark.util.SizeEstimator$$anonfun$visitSingleObject$1.apply(SizeEstimator.scala:202)
>>      at scala.collection.immutable.List.foreach(List.scala:318)
>>      at 
>> org.apache.spark.util.SizeEstimator$.visitSingleObject(SizeEstimator.scala:202)
>>      at 
>> org.apache.spark.util.SizeEstimator$.org$apache$spark$util$SizeEstimator$$estimate(SizeEstimator.scala:186)
>>      at org.apache.spark.util.SizeEstimator$.estimate(SizeEstimator.scala:54)
>>      at 
>> org.apache.spark.util.collection.SizeTracker$class.takeSample(SizeTracker.scala:78)
>>      at 
>> org.apache.spark.util.collection.SizeTracker$class.afterUpdate(SizeTracker.scala:70)
>>      at 
>> org.apache.spark.util.collection.SizeTrackingVector.$plus$eq(SizeTrackingVector.scala:3
>>
>>
>> None of above suggest that it went out ot 15GB of memory that I initially
>> allocated? So what am i missing here. What's eating my memory.
>>
>> We tried executorJavaOpts to get heap dump but it doesn't seem to work.
>>
>> -XX:-HeapDumpOnOutOfMemoryError -XX:OnOutOfMemoryError='kill -3 %p'
>> -XX:HeapDumpPath=/opt/cores/spark
>>
>> I don't see any cores being generated.. neither I can find Heap dump
>> anywhere in logs.
>>
>> Also, how do I find yarn container ID from spark executor ID ? So that I
>> can investigate yarn nodemanager and resourcemanager logs for particular
>> container.
>>
>> PS - Job does not do any caching of intermediate RDD as each RDD is just
>> used once for subsequent step. We use spark 1.5.2 over Yarn in yarn-client
>> mode.
>>
>>
>> Thanks
>>
>>
>>
>>
>>
>>
>> [image: What's New with Xactly]
>> <http://redir.aspx?REF=ntKygyqq2xglSpOZ-_S3UO5ijxjWngbLGIx8zaIX5J1g2CXg_DLTCAFodHRwOi8vd3d3LnhhY3RseWNvcnAuY29tL2VtYWlsLWNsaWNrLw..>
>>
>>
>> <http://redir.aspx?REF=XxqSciJ4MrIo84C2PUGO24k26-2w0PvPU2959_AdXR9g2CXg_DLTCAFodHRwczovL3d3dy5ueXNlLmNvbS9xdW90ZS9YTllTOlhUTFk.>
>>   [image: LinkedIn]
>> <http://redir.aspx?REF=bVtSSTrn15-ccM-NAs45ecVuflV10Fx5fh5Ou10LtH5g2CXg_DLTCAFodHRwczovL3d3dy5saW5rZWRpbi5jb20vY29tcGFueS94YWN0bHktY29ycG9yYXRpb24.>
>>   [image: Twitter]
>> <http://redir.aspx?REF=hRKyztX7dMNmYnYk_eQMpZ6X0aEb6w_Wo2WqaSpPC3pg2CXg_DLTCAFodHRwczovL3R3aXR0ZXIuY29tL1hhY3RseQ..>
>>   [image: Facebook]
>> <http://redir.aspx?REF=OJPtuwme12CQ7d8UKuTgAQVv9R_cBseOrK7pahWXXkBg2CXg_DLTCAFodHRwczovL3d3dy5mYWNlYm9vay5jb20vWGFjdGx5Q29ycA..>
>>   [image: YouTube]
>> <http://redir.aspx?REF=0u4D75j-RVcW_DSAE-gc_4P-J2echiygf3eOWGADkpRg2CXg_DLTCAFodHRwOi8vd3d3LnlvdXR1YmUuY29tL3hhY3RseWNvcnBvcmF0aW9u>
>
>
> This e-mail and any files transmitted with it are for the sole use of the
> intended recipient(s) and may contain confidential and privileged
> information. If you are not the intended recipient(s), please reply to the
> sender and destroy all copies of the original message. Any unauthorized
> review, use, disclosure, dissemination, forwarding, printing or copying of
> this email, and/or any action taken in reliance on the contents of this
> e-mail is strictly prohibited and may be unlawful. Where permitted by
> applicable law, this e-mail and other e-mail communications sent to and
> from Cognizant e-mail addresses may be monitored.
>

-- 


[image: What's New with Xactly] <http://www.xactlycorp.com/email-click/>

<https://www.nyse.com/quote/XNYS:XTLY>  [image: LinkedIn] 
<https://www.linkedin.com/company/xactly-corporation>  [image: Twitter] 
<https://twitter.com/Xactly>  [image: Facebook] 
<https://www.facebook.com/XactlyCorp>  [image: YouTube] 
<http://www.youtube.com/xactlycorporation>

Re: Spark execuotr Memory profiling

Reply via email to