inaryFiles() with more than 1 million
files in HDFS
No luck I am afraid. After giving the namenode 16GB of RAM, I am still getting
an out of mem exception, kind of different one:
15/06/08 15:35:52 ERROR yarn.ApplicationMaster: User class threw
exception: GC overhead limit exceeded
java.lang.OutOfM
ubject: Re: spark timesout maybe due to binaryFiles() with more than 1 million
files in HDFS
No luck I am afraid. After giving the namenode 16GB of RAM, I am still getting
an out of mem exception, kind of different one:
15/06/08 15:35:52 ERROR yarn.ApplicationMaster: User class threw
exception: G
No luck I am afraid. After giving the namenode 16GB of RAM, I am still
getting an out of mem exception, kind of different one:
15/06/08 15:35:52 ERROR yarn.ApplicationMaster: User class threw
exception: GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded
at
o
Thanks, did that and now I am getting an out of memory. But I am not
sure where this occurs. It can't be on the spark executor as I have 28GB
allocated to it. It is not the driver because I run this locally and
monitor it via jvisualvm. Unfortunately I can't jmx-monitor hadoop.
From the stackt
Try putting a * on the end of xmlDir, i.e.
xmlDir = fdfs:///abc/def/*
Rather than
xmlDir = Hdfs://abc/def
and see what happens. I don't know why, but that appears to be more reliable
for me with S3 as the filesystem.
I'm also using binaryFiles, but I've tried running the same command while
w