rs
>
> On Wed, Jul 15, 2015 at 8:06 AM, Saeed Shahrivari <
> saeed.shahriv...@gmail.com> wrote:
>
>> I use a simple map/reduce step in a Java/Spark program to remove
>> duplicated documents from a large (10 TB compressed) sequence file
>> containing som
I use a simple map/reduce step in a Java/Spark program to remove duplicated
documents from a large (10 TB compressed) sequence file containing some
html pages. Here is the partial code:
JavaPairRDD inputRecords =
sc.sequenceFile(args[0], BytesWritable.class,
NullWritable.class).coalesce(numMap
/jira/browse/SPARK-5077 to try to
> come up with a proper fix. In the meantime, I recommend that you increase
> your Akka frame size.
>
> On Sat, Jan 3, 2015 at 8:51 PM, Saeed Shahrivari <
> saeed.shahriv...@gmail.com> wrote:
>
>> I use the 1.2 version.
>>
>&
ncreasing the Akka frame size (via the spark.akka.frameSize
> configuration option).
>
> On Sat, Jan 3, 2015 at 10:40 AM, Saeed Shahrivari <
> saeed.shahriv...@gmail.com> wrote:
>
>> Hi,
>>
>> I am trying to get the frequency of each Unicode char in a document
&g
Hi,
I am trying to get the frequency of each Unicode char in a document
collection using Spark. Here is the code snippet that does the job:
JavaPairRDD rows = sc.sequenceFile(args[0],
LongWritable.class, Text.class);
rows = rows.coalesce(1);
JavaPairRDD pairs = rows.f