Due to repetition in the column metadata, you're still likely to get a
reasonable amount of compression.  This is especially true if there is some
amount of repetition in the column names, values, or TTLs in wide rows.
Compression will almost always be beneficial unless you're already somehow
CPU bound or are using large column values that are high in entropy, such
as pre-compressed or encrypted data.

On Sun, Sep 23, 2012 at 10:29 AM, Tamar Fraenkel <ta...@tok-media.com>wrote:

> Hi!
> In datastax 
> documentation<http://www.datastax.com/docs/1.0/ddl/column_family>there is an 
> explanation of what CFs are a good fit for compression:
>
> When to Use Compression
>
> Compression is best suited for column families where there are many rows,
> with each row having the same columns, or at least many columns in common.
> For example, a column family containing user data such as username, email,
> etc., would be a good candidate for compression. The more similar the data
> across rows, the greater the compression ratio will be, and the larger the
> gain in read performance.
>
> Compression is not as good a fit for column families where each row has a
> different set of columns, or where there are just a few very wide rows.
> Dynamic column families such as this will not yield good compression ratios.
>
> I have many column families where rows share some of the columns and have
> varied number of unique columns per row.
> For example, I have a CF where each row has ~13 shared columns, but
> between 0 to many unique columns. Will such CF be a good fit for
> compression?
>
> More generally, is there a rule of thumb for how many shared columns (or
> percentage of columns which are shared) is considered a good fit for
> compression?
>
> Thanks,
>
> *Tamar Fraenkel *
> Senior Software Engineer, TOK Media
>
> [image: Inline image 1]
>
> ta...@tok-media.com
> Tel:   +972 2 6409736
> Mob:  +972 54 8356490
> Fax:   +972 2 5612956
>
>
>
>


-- 
Tyler Hobbs
DataStax <http://datastax.com/>

<<tokLogo.png>>

Reply via email to