Re: Storing Compressed data in HDFS into Spark

2015-10-22 Thread Adnan Haider
I believe spark.rdd.compress requires the data to be serialized. In my case, I have data already compressed which becomes decompressed as I try to cache it. I believe even when I set spark.rdd.compress to *true, *Spark will still decompress the data and then serialize it and then compress the seria

Re: Storing Compressed data in HDFS into Spark

2015-10-22 Thread Igor Berman
check spark.rdd.compress On 19 October 2015 at 21:13, ahaider3 wrote: > Hi, > A lot of the data I have in HDFS is compressed. I noticed when I load this > data into spark and cache it, Spark unrolls the data like normal but stores > the data uncompressed in memory. For example, suppose /data/ is

Re: Storing Compressed data in HDFS into Spark

2015-10-22 Thread Akhil Das
Convert your data to parquet, it saves space and time. Thanks Best Regards On Mon, Oct 19, 2015 at 11:43 PM, ahaider3 wrote: > Hi, > A lot of the data I have in HDFS is compressed. I noticed when I load this > data into spark and cache it, Spark unrolls the data like normal but stores > the dat