Hi,

We are trying to estimate the amount of storage we need for a production
cassandra cluster.  While I was doing the calculation, I noticed a very
dramatic difference in terms of storage space used by cassandra data files.

Our previous setup consists of a single-node cassandra 0.8.x with no
replication, and the data is stored using supercolumns, and the data files
total about 534GB on disk.

A few weeks ago, I put together a cluster consisting of 3 nodes running
cassandra 1.0 with replication factor of 2, and the data is flattened out
and stored using regular columns.  And the aggregated data file size is
only 488GB (would be 244GB if no replication).

This is a very dramatic reduction in terms of storage needs, and is
certainly good news in terms of how much storage we need to provision.
 However, because of the dramatic reduction, I also would like to make sure
it is absolutely correct before submitting it - and also get a sense of why
there was such a difference. -- I know cassandra 1.0 does data compression,
but does the schema change from supercolumn to regular column also help
reduce storage usage?  Thanks.

-- Y.

Reply via email to