Cassandra has a very high constant per-row overhead at the moment of around 40 
bytes. Additionally, there is around 12 bytes of overhead per column. Finally, 
column names are repeated for each row.

CASSANDRA-674 and CASSANDRA-1207 will help with these overheads, but they will 
not be fixed until 0.8. The file format change should bring lovely things like 
compression and variable length encoding, which Cassandra will gain huge 
benefits from.

But, "disk is cheap"... the solution for now is to add more nodes. And why not?

Thanks,
Stu


-----Original Message-----
From: "Julie" <julie.su...@nextcentury.com>
Sent: Friday, July 9, 2010 9:58am
To: user@cassandra.apache.org
Subject: Help! Cassandra disk space utilization WAY higher than I would expect

Hi guys,
I am on the hook to explain why 30GB of data is filling up 106GB of disk space
since this is concerning information for my project.  

We are very excited about the possibility of using Cassandra but need to
understand this anomaly in order to feel confident.  Does anyone know why this
could be happening?

cfstats reports that space used live is equal to space used total so I think the
data is truly taking up 106GB, I just can't explain why.

                Space used (live): 113946099884
                Space used (total): 113946099884

Thank you for any guidance!
Julie





Reply via email to