I've a Spark cluster with 3 worker nodes.
- *Workers:* 3
- *Cores:* 48 Total, 48 Used
- *Memory:* 469.8 GB Total, 72.0 GB Used
I want a process a single file compressed (*.gz) on HDFS. The file is 1.5GB
compressed and 11GB uncompressed.
When I try to read the compressed file from HDFS i
Yep. I figured that out. I uncompressed the file and it looks much faster
now. Thanks.
On Sun, May 11, 2014 at 8:14 AM, Mayur Rustagi wrote:
> .gz files are not splittable hence harder to process. Easiest is to move
> to a splittable compression like lzo and break file into multiple blocks to
>
.gz files are not splittable hence harder to process. Easiest is to move to
a splittable compression like lzo and break file into multiple blocks to be
read and for subsequent processing.
On 11 May 2014 09:01, "Soumya Simanta" wrote:
>
>
> I've a Spark cluster with 3 worker nodes.
>
>
>- *Wor