fyi after further troubleshooting logging this as
https://issues.apache.org/jira/browse/SPARK-12511
On Tuesday, 22 December 2015, 18:16, Antony Mayi <[email protected]>
wrote:
I narrowed it down to problem described for example here:
https://bugs.openjdk.java.net/browse/JDK-6293787
It is the mass finalization of zip Inflater/Deflater objects which can't keep
up with the rate of these instances being garbage collected. as the jdk bug
report (not being accepted as a bug) suggests this is an error of suboptimal
destruction of the instances.
Not sure where the zip comes from - for all the compressors used in spark I was
using the default snappy codec.
I am trying to disable all the spark.*.compress options and so far it seems
this has dramatically improved, the finalization looks to be keeping up and the
heap is stable.
Any input is still welcome!
On Tuesday, 22 December 2015, 12:17, Ted Yu <[email protected]> wrote:
This might be related but the jmap output there looks different:
http://stackoverflow.com/questions/32537965/huge-number-of-io-netty-buffer-poolthreadcachememoryregioncacheentry-instances
On Tue, Dec 22, 2015 at 2:59 AM, Antony Mayi <[email protected]>
wrote:
I have streaming app (pyspark 1.5.2 on yarn) that's crashing due to driver (jvm
part, not python) OOM (no matter how big heap is assigned, eventually runs out).
When checking the heap it is all taken by "byte" items of
io.netty.buffer.PoolThreadCache. The number of
io.netty.buffer.PoolThreadCache$MemoryRegionCache$Entry is constant yet the
number of [B "bytes" keeps growing as well as the number of Finalizer
instances. When checking the Finalizer instances it is all of
ZipFile$ZipFileInputStream and ZipFile$ZipFileInflaterInputStream
num #instances #bytes class
name---------------------------------------------- 1: 123556
278723776 [B 2: 258988 10359520 java.lang.ref.Finalizer 3:
174620 9778720 java.util.zip.Deflater 4: 66684
7468608 org.apache.spark.executor.TaskMetrics 5: 80070
7160112 [C 6: 282624 6782976
io.netty.buffer.PoolThreadCache$MemoryRegionCache$Entry 7: 206371
4952904 java.lang.Long
the platform is using netty 3.6.6 and openjdk 1.8 (tried on 1.7 as well with
same issue).
would anyone have a clue how to troubleshoot further?
thx.