Re: does spark sql support columnar compression with encoding when caching tables

Cheng Lian Sun, 21 Dec 2014 04:29:50 -0800

Would like to add that compression schemes built in in-memory columnarstorage only supports primitive columns (int, string, etc.), complextypes like array, map and struct are not supported.


On 12/20/14 6:17 AM, Sadhan Sood wrote:

Hey Michael,
Thank you for clarifying that. Is tachyon the right way to getcompressed data in memory or should we explore the option of addingcompression to cached data. This is because our uncompressed data setis too big to fit in memory right now. I see the benefit of tachyonnot just with storing compressed data in memory but we wouldn't haveto create a separate table for caching some partitions like 'cachetable table_cached as select * from table where date = 201412XX' - theway we are doing right now.
On Thu, Dec 18, 2014 at 6:46 PM, Michael Armbrust<mich...@databricks.com <mailto:mich...@databricks.com>> wrote:
    There is only column level encoding (run length encoding, delta
    encoding, dictionary encoding) and no generic compression.

    On Thu, Dec 18, 2014 at 12:07 PM, Sadhan Sood
    <sadhan.s...@gmail.com <mailto:sadhan.s...@gmail.com>> wrote:

        Hi All,

        Wondering if when caching a table backed by lzo compressed
        parquet data, if spark also compresses it (using
        lzo/gzip/snappy) along with column level encoding or just does
        the column level encoding when
        "*spark.sql.inMemoryColumnarStorage.compressed" *is set to
        true. This is because when I try to cache the data, I notice
        the memory being used is almost as much as the uncompressed
        size of the data.

        Thanks!

Re: does spark sql support columnar compression with encoding when caching tables

Reply via email to