Also, see: https://issues.apache.org/jira/browse/CASSANDRA-1207
-----Original Message----- From: "Terje Marthinussen" <tmarthinus...@gmail.com> Sent: Monday, August 30, 2010 6:58am To: dev@cassandra.apache.org Subject: cassandra disk usage Hi, Was just looking at a SSTable file after loading a dataset. The data load has no updates of data but: - Columns can in some rare cases be added to existing super columns - SuperColumns will be added to the same key (but not overwriting existing data). I batch these, but it is quite likely that there will be 2-3 updates to a key. This is a random selected SSTable file from a much bigger dataset. The data is stored as date(super)/type(column)/value Date is a simple "20100811" type string. Value is a small integer, 2 digit on average If I run a simple strings on the SSTable and look for the data: value: 692Kbyte of data type: 4.01MByte of data date: 4.6MB of data In total: 9.4MByte The size of the .db file however, is 36.4MB... The expansion from the column headers are bad enough, but I can somehow accept that. The almost 4x expansion on top of that is a bit harder to justify... Anyone know already where this expansion comes from? Or I need to take a careful look at source (probably useful anyway :)) Terje