Yes there is.
But the RDD is more than 10 TB and compression does not help.
On Wed, Jul 15, 2015 at 8:36 PM, Ted Yu wrote:
> bq. serializeUncompressed()
>
> Is there a method which enables compression ?
>
> Just wondering if that would reduce the memory footprint.
>
> Cheers
>
> On Wed, Jul 15,
bq. serializeUncompressed()
Is there a method which enables compression ?
Just wondering if that would reduce the memory footprint.
Cheers
On Wed, Jul 15, 2015 at 8:06 AM, Saeed Shahrivari <
saeed.shahriv...@gmail.com> wrote:
> I use a simple map/reduce step in a Java/Spark program to remove
>
I use a simple map/reduce step in a Java/Spark program to remove duplicated
documents from a large (10 TB compressed) sequence file containing some
html pages. Here is the partial code:
JavaPairRDD inputRecords =
sc.sequenceFile(args[0], BytesWritable.class,
NullWritable.class).coalesce(numMap