> does cassandra 1.0 perform some default compression? No. The on disk size depends to some degree on the work load.
If there are a lot of overwrites or deleted you may have rows/columns that need to be compacted. You may have some big old SSTables that have not been compacted for a while. There is some overhead involved in the super columns: the super col name, length of the name and the number of columns. Cheers ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 29/03/2012, at 9:47 AM, Yiming Sun wrote: > Actually, after I read an article on cassandra 1.0 compression just now ( > http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-compression), I > am more puzzled. In our schema, we didn't specify any compression options -- > does cassandra 1.0 perform some default compression? or is the data reduction > purely because of the schema change? Thanks. > > -- Y. > > On Wed, Mar 28, 2012 at 4:40 PM, Yiming Sun <yiming....@gmail.com> wrote: > Hi, > > We are trying to estimate the amount of storage we need for a production > cassandra cluster. While I was doing the calculation, I noticed a very > dramatic difference in terms of storage space used by cassandra data files. > > Our previous setup consists of a single-node cassandra 0.8.x with no > replication, and the data is stored using supercolumns, and the data files > total about 534GB on disk. > > A few weeks ago, I put together a cluster consisting of 3 nodes running > cassandra 1.0 with replication factor of 2, and the data is flattened out and > stored using regular columns. And the aggregated data file size is only > 488GB (would be 244GB if no replication). > > This is a very dramatic reduction in terms of storage needs, and is certainly > good news in terms of how much storage we need to provision. However, > because of the dramatic reduction, I also would like to make sure it is > absolutely correct before submitting it - and also get a sense of why there > was such a difference. -- I know cassandra 1.0 does data compression, but > does the schema change from supercolumn to regular column also help reduce > storage usage? Thanks. > > -- Y. >