Currently there’s no way to cache the compressed sequence file directly. Spark SQL uses in-memory columnar format while caching table rows, so we must read all the raw data and convert them into columnar format. However, you can enable in-memory columnar compression by setting |spark.sql.inMemoryColumnarStorage.compressed| to |true|. This property is already set to true by default in master branch and branch-1.2.

On 11/13/14 7:16 AM, Sadhan Sood wrote:

We noticed while caching data from our hive tables which contain data in compressed sequence file format that it gets uncompressed in memory when getting cached. Is there a way to turn this off and cache the compressed data as is ?

Reply via email to