Actually, after I read an article on cassandra 1.0 compression just now ( http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-compression), I am more puzzled. In our schema, we didn't specify any compression options -- does cassandra 1.0 perform some default compression? or is the data reduction purely because of the schema change? Thanks.
-- Y. On Wed, Mar 28, 2012 at 4:40 PM, Yiming Sun <yiming....@gmail.com> wrote: > Hi, > > We are trying to estimate the amount of storage we need for a production > cassandra cluster. While I was doing the calculation, I noticed a very > dramatic difference in terms of storage space used by cassandra data files. > > Our previous setup consists of a single-node cassandra 0.8.x with no > replication, and the data is stored using supercolumns, and the data files > total about 534GB on disk. > > A few weeks ago, I put together a cluster consisting of 3 nodes running > cassandra 1.0 with replication factor of 2, and the data is flattened out > and stored using regular columns. And the aggregated data file size is > only 488GB (would be 244GB if no replication). > > This is a very dramatic reduction in terms of storage needs, and is > certainly good news in terms of how much storage we need to provision. > However, because of the dramatic reduction, I also would like to make sure > it is absolutely correct before submitting it - and also get a sense of why > there was such a difference. -- I know cassandra 1.0 does data compression, > but does the schema change from supercolumn to regular column also help > reduce storage usage? Thanks. > > -- Y. >