Actually, after I read an article on cassandra 1.0 compression just now (
http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-compression), I
am more puzzled.  In our schema, we didn't specify any compression options
-- does cassandra 1.0 perform some default compression? or is the data
reduction purely because of the schema change?  Thanks.

-- Y.

On Wed, Mar 28, 2012 at 4:40 PM, Yiming Sun <yiming....@gmail.com> wrote:

> Hi,
>
> We are trying to estimate the amount of storage we need for a production
> cassandra cluster.  While I was doing the calculation, I noticed a very
> dramatic difference in terms of storage space used by cassandra data files.
>
> Our previous setup consists of a single-node cassandra 0.8.x with no
> replication, and the data is stored using supercolumns, and the data files
> total about 534GB on disk.
>
> A few weeks ago, I put together a cluster consisting of 3 nodes running
> cassandra 1.0 with replication factor of 2, and the data is flattened out
> and stored using regular columns.  And the aggregated data file size is
> only 488GB (would be 244GB if no replication).
>
> This is a very dramatic reduction in terms of storage needs, and is
> certainly good news in terms of how much storage we need to provision.
>  However, because of the dramatic reduction, I also would like to make sure
> it is absolutely correct before submitting it - and also get a sense of why
> there was such a difference. -- I know cassandra 1.0 does data compression,
> but does the schema change from supercolumn to regular column also help
> reduce storage usage?  Thanks.
>
> -- Y.
>

Reply via email to