I believe spark.rdd.compress requires the data to be serialized. In my
case, I have data already compressed which becomes decompressed as I try to
cache it. I believe even when I set spark.rdd.compress to *true, *Spark
will still decompress the data and then serialize it and then compress the
seria
check spark.rdd.compress
On 19 October 2015 at 21:13, ahaider3 wrote:
> Hi,
> A lot of the data I have in HDFS is compressed. I noticed when I load this
> data into spark and cache it, Spark unrolls the data like normal but stores
> the data uncompressed in memory. For example, suppose /data/ is
Convert your data to parquet, it saves space and time.
Thanks
Best Regards
On Mon, Oct 19, 2015 at 11:43 PM, ahaider3 wrote:
> Hi,
> A lot of the data I have in HDFS is compressed. I noticed when I load this
> data into spark and cache it, Spark unrolls the data like normal but stores
> the dat