Re: Strange Error: "java.lang.OutOfMemoryError: GC overhead limit exceeded"

2015-07-15 Thread Saeed Shahrivari
rs > > On Wed, Jul 15, 2015 at 8:06 AM, Saeed Shahrivari < > saeed.shahriv...@gmail.com> wrote: > >> I use a simple map/reduce step in a Java/Spark program to remove >> duplicated documents from a large (10 TB compressed) sequence file >> containing som

Strange Error: "java.lang.OutOfMemoryError: GC overhead limit exceeded"

2015-07-15 Thread Saeed Shahrivari
I use a simple map/reduce step in a Java/Spark program to remove duplicated documents from a large (10 TB compressed) sequence file containing some html pages. Here is the partial code: JavaPairRDD inputRecords = sc.sequenceFile(args[0], BytesWritable.class, NullWritable.class).coalesce(numMap

Re: spark.akka.frameSize limit error

2015-01-05 Thread Saeed Shahrivari
/jira/browse/SPARK-5077 to try to > come up with a proper fix. In the meantime, I recommend that you increase > your Akka frame size. > > On Sat, Jan 3, 2015 at 8:51 PM, Saeed Shahrivari < > saeed.shahriv...@gmail.com> wrote: > >> I use the 1.2 version. >> >&

Re: spark.akka.frameSize limit error

2015-01-03 Thread Saeed Shahrivari
ncreasing the Akka frame size (via the spark.akka.frameSize > configuration option). > > On Sat, Jan 3, 2015 at 10:40 AM, Saeed Shahrivari < > saeed.shahriv...@gmail.com> wrote: > >> Hi, >> >> I am trying to get the frequency of each Unicode char in a document &g

spark.akka.frameSize limit error

2015-01-03 Thread Saeed Shahrivari
Hi, I am trying to get the frequency of each Unicode char in a document collection using Spark. Here is the code snippet that does the job: JavaPairRDD rows = sc.sequenceFile(args[0], LongWritable.class, Text.class); rows = rows.coalesce(1); JavaPairRDD pairs = rows.f