> does cassandra 1.0 perform some default compression? 
No. 

The on disk size depends to some degree on the work load. 

If there are a lot of overwrites or deleted you may have rows/columns that need 
to be compacted. You may have some big old SSTables that have not been 
compacted for a while. 

There is some overhead involved in the super columns: the super col name, 
length of the name and the number of columns.  

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 29/03/2012, at 9:47 AM, Yiming Sun wrote:

> Actually, after I read an article on cassandra 1.0 compression just now ( 
> http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-compression), I 
> am more puzzled.  In our schema, we didn't specify any compression options -- 
> does cassandra 1.0 perform some default compression? or is the data reduction 
> purely because of the schema change?  Thanks.
> 
> -- Y.
> 
> On Wed, Mar 28, 2012 at 4:40 PM, Yiming Sun <yiming....@gmail.com> wrote:
> Hi,
> 
> We are trying to estimate the amount of storage we need for a production 
> cassandra cluster.  While I was doing the calculation, I noticed a very 
> dramatic difference in terms of storage space used by cassandra data files.
> 
> Our previous setup consists of a single-node cassandra 0.8.x with no 
> replication, and the data is stored using supercolumns, and the data files 
> total about 534GB on disk.
> 
> A few weeks ago, I put together a cluster consisting of 3 nodes running 
> cassandra 1.0 with replication factor of 2, and the data is flattened out and 
> stored using regular columns.  And the aggregated data file size is only 
> 488GB (would be 244GB if no replication).
> 
> This is a very dramatic reduction in terms of storage needs, and is certainly 
> good news in terms of how much storage we need to provision.  However, 
> because of the dramatic reduction, I also would like to make sure it is 
> absolutely correct before submitting it - and also get a sense of why there 
> was such a difference. -- I know cassandra 1.0 does data compression, but 
> does the schema change from supercolumn to regular column also help reduce 
> storage usage?  Thanks.
> 
> -- Y.
> 

Reply via email to