Hey Pala,

I also find it very hard to get to the bottom of memory issues such as this
one based on what's in the logs (so if you come up with some findings, then
please share here). In the interim, here are a few things you can try:

   - Provision more memory per executor. While in theory (and depending on
   your storage level) data can be spilled to disk or recomputed from lineage
   if it doesn't fit into memory, I have experienced a lot of problems with
   failing jobs when underprovisioning memory.
   - Experiment with both the memory and shuffle fractions.
   - Repartition your data so that you get smaller tasks.

As far as object size goes, since your issue occurs on deserialization, you
could compute the size on the map side and roll it up into a histogram.

Hope this helps!

-Sven



On Mon, Jan 12, 2015 at 2:48 PM, Pala M Muthaia <mchett...@rocketfuelinc.com
> wrote:

> Does anybody have insight on this? Thanks.
>
> On Fri, Jan 9, 2015 at 6:30 PM, Pala M Muthaia <
> mchett...@rocketfuelinc.com> wrote:
>
>> Hi,
>>
>> I am using Spark 1.0.1. I am trying to debug a OOM exception i saw during
>> a join step.
>>
>> Basically, i have a RDD of rows, that i am joining with another RDD of
>> tuples.
>>
>> Some of the tasks succeed but a fair number failed with OOM exception
>> with stack below. The stack belongs to the 'reducer' that is reading
>> shuffle output from the 'mapper'.
>>
>> My question is what's the object being deserialized here - just a portion
>> of an RDD or the whole RDD partition assigned to current reducer? The rows
>> in the RDD could be large, but definitely not something that would run to
>> 100s of MBs in size, and thus run out of memory.
>>
>> Also, is there a way to determine size of the object being deserialized
>> that results in the error (either by looking at some staging hdfs dir or
>> logs)?
>>
>> java.lang.OutOfMemoryError (java.lang.OutOfMemoryError: GC overhead limit 
>> exceeded}
>> java.util.Arrays.copyOf(Arrays.java:2367)
>> java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)
>> java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)
>> java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:535)
>> java.lang.StringBuilder.append(StringBuilder.java:204)
>> java.io.ObjectInputStream$BlockDataInputStream.readUTFSpan(ObjectInputStream.java:3142)
>> java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:3050)
>> java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:2863)
>> java.io.ObjectInputStream.readString(ObjectInputStream.java:1636)
>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1339)
>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>> java.util.ArrayList.readObject(ArrayList.java:771)
>> sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> java.lang.reflect.Method.invoke(Method.java:606)
>> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1891)
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:63)
>> org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:125)
>> org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
>> org.apache.spark.storage.BlockManager$LazyProxyIterator$1.hasNext(BlockManager.scala:1031)
>>
>>
>>
>> Thanks,
>> pala
>>
>
>


-- 
http://sites.google.com/site/krasser/?utm_source=sig

Reply via email to