subject:"RE\: spark timesout maybe due to binaryFiles\(\) with more than 1 million files in HDFS"

Re: spark timesout maybe due to binaryFiles() with more than 1 million files in HDFS

2015-06-08 Thread Konstantinos Kougios

inaryFiles() with more than 1 million files in HDFS No luck I am afraid. After giving the namenode 16GB of RAM, I am still getting an out of mem exception, kind of different one: 15/06/08 15:35:52 ERROR yarn.ApplicationMaster: User class threw exception: GC overhead limit exceeded java.lang.OutOfM

RE: spark timesout maybe due to binaryFiles() with more than 1 million files in HDFS

2015-06-08 Thread Ewan Leith

ubject: Re: spark timesout maybe due to binaryFiles() with more than 1 million files in HDFS No luck I am afraid. After giving the namenode 16GB of RAM, I am still getting an out of mem exception, kind of different one: 15/06/08 15:35:52 ERROR yarn.ApplicationMaster: User class threw exception: G

Re: spark timesout maybe due to binaryFiles() with more than 1 million files in HDFS

2015-06-08 Thread Konstantinos Kougios

No luck I am afraid. After giving the namenode 16GB of RAM, I am still getting an out of mem exception, kind of different one: 15/06/08 15:35:52 ERROR yarn.ApplicationMaster: User class threw exception: GC overhead limit exceeded java.lang.OutOfMemoryError: GC overhead limit exceeded at o

Re: spark timesout maybe due to binaryFiles() with more than 1 million files in HDFS

2015-06-08 Thread Konstantinos Kougios

Thanks, did that and now I am getting an out of memory. But I am not sure where this occurs. It can't be on the spark executor as I have 28GB allocated to it. It is not the driver because I run this locally and monitor it via jvisualvm. Unfortunately I can't jmx-monitor hadoop. From the stackt

RE: spark timesout maybe due to binaryFiles() with more than 1 million files in HDFS

2015-06-08 Thread Ewan Leith

Try putting a * on the end of xmlDir, i.e. xmlDir = fdfs:///abc/def/* Rather than xmlDir = Hdfs://abc/def and see what happens. I don't know why, but that appears to be more reliable for me with S3 as the filesystem. I'm also using binaryFiles, but I've tried running the same command while w

Re: spark timesout maybe due to binaryFiles() with more than 1 million files in HDFS

RE: spark timesout maybe due to binaryFiles() with more than 1 million files in HDFS

Re: spark timesout maybe due to binaryFiles() with more than 1 million files in HDFS

Re: spark timesout maybe due to binaryFiles() with more than 1 million files in HDFS

RE: spark timesout maybe due to binaryFiles() with more than 1 million files in HDFS

5 matches

Site Navigation

Mail list logo

Footer information