subject:"Fwd\: Is there a way to load a large file from HDFS faster into Spark"

Fwd: Is there a way to load a large file from HDFS faster into Spark

2014-05-15 Thread Soumya Simanta

I've a Spark cluster with 3 worker nodes. - *Workers:* 3 - *Cores:* 48 Total, 48 Used - *Memory:* 469.8 GB Total, 72.0 GB Used I want a process a single file compressed (*.gz) on HDFS. The file is 1.5GB compressed and 11GB uncompressed. When I try to read the compressed file from HDFS i

Re: Fwd: Is there a way to load a large file from HDFS faster into Spark

2014-05-11 Thread Soumya Simanta

Yep. I figured that out. I uncompressed the file and it looks much faster now. Thanks. On Sun, May 11, 2014 at 8:14 AM, Mayur Rustagi wrote: > .gz files are not splittable hence harder to process. Easiest is to move > to a splittable compression like lzo and break file into multiple blocks to >

Re: Fwd: Is there a way to load a large file from HDFS faster into Spark

2014-05-11 Thread Mayur Rustagi

.gz files are not splittable hence harder to process. Easiest is to move to a splittable compression like lzo and break file into multiple blocks to be read and for subsequent processing. On 11 May 2014 09:01, "Soumya Simanta" wrote: > > > I've a Spark cluster with 3 worker nodes. > > >- *Wor

Fwd: Is there a way to load a large file from HDFS faster into Spark

Re: Fwd: Is there a way to load a large file from HDFS faster into Spark

Re: Fwd: Is there a way to load a large file from HDFS faster into Spark

3 matches

Site Navigation

Mail list logo

Footer information