Re: Strange Error: "java.lang.OutOfMemoryError: GC overhead limit exceeded"

2015-07-15 Thread Saeed Shahrivari
Yes there is. But the RDD is more than 10 TB and compression does not help. On Wed, Jul 15, 2015 at 8:36 PM, Ted Yu wrote: > bq. serializeUncompressed() > > Is there a method which enables compression ? > > Just wondering if that would reduce the memory footprint. > > Cheers > > On Wed, Jul 15,

Re: Strange Error: "java.lang.OutOfMemoryError: GC overhead limit exceeded"

2015-07-15 Thread Ted Yu
bq. serializeUncompressed() Is there a method which enables compression ? Just wondering if that would reduce the memory footprint. Cheers On Wed, Jul 15, 2015 at 8:06 AM, Saeed Shahrivari < saeed.shahriv...@gmail.com> wrote: > I use a simple map/reduce step in a Java/Spark program to remove >

Strange Error: "java.lang.OutOfMemoryError: GC overhead limit exceeded"

2015-07-15 Thread Saeed Shahrivari
I use a simple map/reduce step in a Java/Spark program to remove duplicated documents from a large (10 TB compressed) sequence file containing some html pages. Here is the partial code: JavaPairRDD inputRecords = sc.sequenceFile(args[0], BytesWritable.class, NullWritable.class).coalesce(numMap