Currently there’s no way to cache the compressed sequence file directly.
Spark SQL uses in-memory columnar format while caching table rows, so we
must read all the raw data and convert them into columnar format.
However, you can enable in-memory columnar compression by setting
|spark.sql.inMemoryColumnarStorage.compressed| to |true|. This property
is already set to true by default in master branch and branch-1.2.
On 11/13/14 7:16 AM, Sadhan Sood wrote:
We noticed while caching data from our hive tables which contain data
in compressed sequence file format that it gets uncompressed in memory
when getting cached. Is there a way to turn this off and cache the
compressed data as is ?